<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Stephan Miller</title>
    <description>The latest articles on Forem by Stephan Miller (@eristoddle).</description>
    <link>https://forem.com/eristoddle</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F18795%2F5f6c41b8-6033-4887-937a-2ebdfe623d2e.jpeg</url>
      <title>Forem: Stephan Miller</title>
      <link>https://forem.com/eristoddle</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/eristoddle"/>
    <language>en</language>
    <item>
      <title>The Autoresearch Ecosystem - How One Repo Spawned 9 Different Types of AI Projects</title>
      <dc:creator>Stephan Miller</dc:creator>
      <pubDate>Mon, 04 May 2026 12:00:00 +0000</pubDate>
      <link>https://forem.com/eristoddle/the-autoresearch-ecosystem-how-one-repo-spawned-9-different-types-of-ai-projects-335c</link>
      <guid>https://forem.com/eristoddle/the-autoresearch-ecosystem-how-one-repo-spawned-9-different-types-of-ai-projects-335c</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy1ue5ldwtkl8pj549p12.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy1ue5ldwtkl8pj549p12.jpg" alt=" " width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’d been messing around with &lt;a href="https://github.com/karpathy/autoresearch" rel="noopener noreferrer"&gt;Karpathy’s autoresearch&lt;/a&gt; for a couple of weekends, mostly because I’m interested in letting agents do shit while I sleep and someone had finally formalized the pattern in 630 lines of Python. Run the loop, modify &lt;code&gt;train.py&lt;/code&gt;, train for five minutes, check val_bpb, keep or revert, repeat forever. Compounding gains while you’re not even at your desk.&lt;/p&gt;

&lt;p&gt;So I fired up GitHub search for “autoresearch” expecting to find a handful of ML forks. People porting it to their hardware, maybe a few hyperparameter tweaks. You know how that goes.&lt;/p&gt;

&lt;p&gt;I found nine distinct categories of project. Some brilliant. Some “why did you do this.” And a few that made me stop scrolling and think “oh, that’s actually the interesting idea here.” It turns out the original repo isn’t really about ML. It’s a pattern, and people figured that out pretty quickly.&lt;/p&gt;

&lt;p&gt;I’m going to walk through every category I found, what each one actually does differently, and what they tell us about where this whole thing is going. There are a lot of repos here, all linked.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What Karpathy Actually Built&lt;/li&gt;
&lt;li&gt;
1. Platform Ports: Running It On Hardware You Actually Own

&lt;ul&gt;
&lt;li&gt;GPU Cluster Scaling&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

2. ML Research Enhancers: Making the Loop Smarter

&lt;ul&gt;
&lt;li&gt;Memory-Enhanced Researchers&lt;/li&gt;
&lt;li&gt;Bayesian + Active Inference&lt;/li&gt;
&lt;li&gt;Multi-GPU Infrastructure&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

3. Prompt Optimizers: Same Loop, Different Target File

&lt;ul&gt;
&lt;li&gt;autoresearch-prompt-optimization (az9713)&lt;/li&gt;
&lt;li&gt;autoresearch-for-agents (Galileo)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

4. Generalized Frameworks: Autoresearch For Anything

&lt;ul&gt;
&lt;li&gt;uditgoenka/autoresearch — Claude Code Skill&lt;/li&gt;
&lt;li&gt;autoresearch-anything (zkarimi22)&lt;/li&gt;
&lt;li&gt;menonpg/autoloop — The pip Package&lt;/li&gt;
&lt;li&gt;krzysztofdudek/ResearcherSkill — One File, Full Discipline&lt;/li&gt;
&lt;li&gt;alfonsograziano/auto-agent — Autoresearch Builds Agents&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

5. Production Codebase Optimization: Autoresearch on Real OSS

&lt;ul&gt;
&lt;li&gt;More Production War Stories&lt;/li&gt;
&lt;li&gt;idealo Search Ranking&lt;/li&gt;
&lt;li&gt;Tennis XGBoost — The Reward Hacking Cautionary Tale&lt;/li&gt;
&lt;li&gt;Vesuvius Challenge Ink Detection&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;6. Agent Factory: Autoresearch Builds Agents&lt;/li&gt;

&lt;li&gt;

7. Research OS / Skills Systems: Institutionalizing the Pattern

&lt;ul&gt;
&lt;li&gt;PhD-Zero (TenureAI)&lt;/li&gt;
&lt;li&gt;alirezarezvani/claude-skills&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

8. Creative Writing: Autoresearch For Prose and Fiction

&lt;ul&gt;
&lt;li&gt;redpen — Prose Refinement Engine&lt;/li&gt;
&lt;li&gt;NousResearch/autonovel — Complete Novel Pipeline&lt;/li&gt;
&lt;li&gt;sinfiny/Auto-Creative-Reasoning&lt;/li&gt;
&lt;li&gt;CalvinMagezi/self-evolving-skill — Brand Document Evolution&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

9. Meta-Pattern: Wrapping Autoresearch as a Worker

&lt;ul&gt;
&lt;li&gt;The Problem with Solo Autoresearch&lt;/li&gt;
&lt;li&gt;The Fix: 3 Files, 4 Subagents&lt;/li&gt;
&lt;li&gt;What Actually Broke In Production&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;So What Does This Actually Mean?&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Karpathy Actually Built
&lt;/h2&gt;

&lt;p&gt;Before we go through the derivatives, let’s look at the original. The repo is small and the loop is dumb on purpose:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read &lt;code&gt;program.md&lt;/code&gt; (the meta-skill that tells the agent how to be a researcher)&lt;/li&gt;
&lt;li&gt;Modify &lt;code&gt;train.py&lt;/code&gt; with a small, reviewable diff&lt;/li&gt;
&lt;li&gt;Train for ~5 minutes on one GPU&lt;/li&gt;
&lt;li&gt;Check val_bpb (validation bits per byte — the metric)&lt;/li&gt;
&lt;li&gt;If it improved, commit. If it regressed, &lt;code&gt;git reset --hard&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Goto 1.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s it. About 100 experiments overnight on a single H100 while you sleep. Git is the memory. The flat TSV file is the search log. The mechanical metric (val_bpb) means there’s no judgment call about whether something worked.&lt;/p&gt;

&lt;p&gt;The main idea is that &lt;strong&gt;constraint enables autonomy&lt;/strong&gt;. The diffs are small, so they’re reviewable. The metric is mechanical, so the agent can’t argue with it. The rollback is automatic, so a bad experiment can’t poison the next one. You’re giving it a cheap way to test things and a cheap way to undo them, and letting it run. Not asking it to be smart.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;program.md&lt;/code&gt; is what Karpathy calls the meta-skill. Humans don’t program the training run. They program the researcher that programs the training run. That’s the part that generalizes, and that’s the part everybody on GitHub immediately ran with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frh3epa29pv9qnup60g0l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frh3epa29pv9qnup60g0l.png" alt="Karpathy's original screenshot showing val_bpb improvement curve" width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Platform Ports: Running It On Hardware You Actually Own
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The “I don’t have an H100” forks&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The first thing that happened is what always happens. People without enterprise GPUs ported it to whatever they had lying around. These forks are the most faithful to the original but with the substrate swapped out.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/miolini/autoresearch-macos" rel="noopener noreferrer"&gt;&lt;code&gt;miolini/autoresearch-macos&lt;/code&gt;&lt;/a&gt; — straight macOS port using MPS backend&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/trevin-creator/autoresearch-mlx" rel="noopener noreferrer"&gt;&lt;code&gt;trevin-creator/autoresearch-mlx&lt;/code&gt;&lt;/a&gt; — Apple Silicon native, using MLX instead of PyTorch&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/jsegov/autoresearch-win-rtx" rel="noopener noreferrer"&gt;&lt;code&gt;jsegov/autoresearch-win-rtx&lt;/code&gt;&lt;/a&gt; — Windows with RTX&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/lucasgelfond/autoresearch-webgpu" rel="noopener noreferrer"&gt;&lt;code&gt;lucasgelfond/autoresearch-webgpu&lt;/code&gt;&lt;/a&gt; — runs entirely in the browser using WebGPU. No Python setup. The whole research loop in a tab.&lt;/li&gt;
&lt;li&gt;A Colab/Kaggle T4 port (upstream issue #208) that swaps Flash Attention 3 for PyTorch SDPA so you can run experiments overnight on a free GPU&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ArmanJR-Lab/autoautoresearch" rel="noopener noreferrer"&gt;&lt;code&gt;ArmanJR-Lab/autoautoresearch&lt;/code&gt;&lt;/a&gt; — Jetson AGX Orin port with a “director” written in Go that injects novelty (arxiv papers, DeepSeek Reasoner output) when the loop gets stuck in local minima&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/supratikpm/gemini-autoresearch" rel="noopener noreferrer"&gt;&lt;code&gt;supratikpm/gemini-autoresearch&lt;/code&gt;&lt;/a&gt; — Gemini CLI native, with Google Search grounding plugged into the loop as a live verification source. True headless overnight mode via &lt;code&gt;--yolo --prompt&lt;/code&gt;. 1M token context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Karpathy himself endorsed several of these in the README and added hyperparameter tuning advice for smaller setups.&lt;/p&gt;

&lt;p&gt;The interesting ones in this group aren’t the “same thing on Mac” ports. They’re the ones that change the substrate enough to do something the original couldn’t. MLX on Apple Silicon is legitimately different compute. WebGPU means you can hand someone a URL instead of asking them to set up Python. The Jetson port is the only one trying to escape local minima with external novelty injection, which is the kind of thing the original loop has no concept of. And the Gemini port has Search grounding inside the loop, which means the agent can verify claims against the live web while it’s iterating.&lt;/p&gt;

&lt;p&gt;The Apple Silicon and WebGPU ports are the most useful if you don’t have data center hardware. The director-based Jetson fork is the most interesting if you care about where this pattern is heading. Most loops can hill-climb. Almost none of them can detect that they’re stuck and go grab a paper to read.&lt;/p&gt;

&lt;h3&gt;
  
  
  GPU Cluster Scaling
&lt;/h3&gt;

&lt;p&gt;The opposite direction. What happens if you give it 16 GPUs instead of one?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blog.skypilot.co/scaling-autoresearch/" rel="noopener noreferrer"&gt;SkyPilot wrote it up&lt;/a&gt;. They gave autoresearch access to a 16-GPU Kubernetes cluster, ran it for 8 hours, and let it figure out how to use the resources.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~910 experiments in 8 hours&lt;/li&gt;
&lt;li&gt;val_bpb dropped from 1.003 to 0.974 (a 2.87% improvement, which sounds small but is enormous for an LM at this scale)&lt;/li&gt;
&lt;li&gt;9x faster than a simulated sequential baseline to reach the same result&lt;/li&gt;
&lt;li&gt;The agent taught itself to use H200s for validation and screen ideas on cheaper H100s. Nobody told it to do that.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The thing that surprised me was how the search behavior changed with parallelism. Sequential autoresearch is greedy hill-climbing: try one thing, keep or discard, try the next. Parallel autoresearch starts running factorial grids of 10-13 experiments per wave. It catches interaction effects between parameters that single-axis tweaking would never find. Two changes that look mediocre alone can be great together. You can’t see that one-at-a-time.&lt;/p&gt;

&lt;p&gt;This is the version that stops looking like a hobby project. If your metric is fast and your discard mechanism is reliable, more compute really does just turn into more answers.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. ML Research Enhancers: Making the Loop Smarter
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The “the flat TSV is not enough” camp&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;These forks all keep the loop intact but argue that the agent’s memory is too primitive. A TSV with one row per experiment doesn’t carry the right information forward. So they bolt on cognitive architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory-Enhanced Researchers
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/tonitangpotato/autoresearch-engram" rel="noopener noreferrer"&gt;&lt;code&gt;tonitangpotato/autoresearch-engram&lt;/code&gt;&lt;/a&gt; plugs the Engram cognitive memory library into the loop. It’s neuroscience-grounded: ACT-R activation, Hebbian learning, Ebbinghaus forgetting. RECALL and STORE steps wrap around the existing loop.&lt;/p&gt;

&lt;p&gt;The numbers from a long-running instance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After 50 experiments, the agent recognizes patterns like “architecture changes outperform optimizer tweaks in this regime”&lt;/li&gt;
&lt;li&gt;After 100, it knows the optimal architecture for your specific compute budget&lt;/li&gt;
&lt;li&gt;One production deployment is at 3,846 memories, 230,103 recalls, 12,510 Hebbian links&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What that buys you, supposedly, is research intuition. Not “this worked” but “here’s why and here’s the pattern.” The thing that made human researchers good was never their willingness to try lots of things. It was the priors they built up about what was worth trying.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bayesian + Active Inference
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ErikDeBruijn/autoresearcher2" rel="noopener noreferrer"&gt;&lt;code&gt;ErikDeBruijn/autoresearcher2&lt;/code&gt;&lt;/a&gt; is the most ambitious one I found. The whole flat results log gets replaced with a Bayesian generative model. Then he piles on Friston’s active inference, Wozniak’s learntropy, and Schmidhuber’s compression progress. The agent doesn’t just ask “was this experiment good?” It asks “which of my latent beliefs was wrong?”&lt;/p&gt;

&lt;p&gt;Four additions to the original loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generative model over experiment outcomes&lt;/li&gt;
&lt;li&gt;Policy evaluation via Expected Free Energy&lt;/li&gt;
&lt;li&gt;Learntropy appraisal module&lt;/li&gt;
&lt;li&gt;Persistent memory with decay dynamics&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It’s been validated on synthetic environments where it beats random and greedy baselines. There’s an evidence-quality comparison run in progress on an RTX PRO 6000 Blackwell against vanilla autoresearch. The repo also has a &lt;code&gt;CONSTITUTION.md&lt;/code&gt; because the project is partially about whether recursive self-improvement can deepen judgment, not just power.&lt;/p&gt;

&lt;p&gt;The interesting distinction is structural insight (“RoPE matters more than the optimizer in this regime”) versus flat knowledge (“RoPE improved val_bpb by 0.02”). The flat version doesn’t compose. The structural version does.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-GPU Infrastructure
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/iii-hq/n-autoresearch" rel="noopener noreferrer"&gt;&lt;code&gt;iii-hq/n-autoresearch&lt;/code&gt;&lt;/a&gt; keeps the loop and replaces the plumbing. Out goes bash + git + TSV. In comes structured KV state, a REST API, and crash recovery. Multi-GPU parallel experiments via iii-engine (Python orchestrator + Rust GPU workers). Cross-machine GPU workers.&lt;/p&gt;

&lt;p&gt;The clever part is the adaptive search strategy. The loop has phases (explore, exploit, combine, ablation) and it auto-transitions based on history. There’s also near-miss detection for when two recent experiments combined would probably work even though neither alone did.&lt;/p&gt;

&lt;p&gt;Honestly, this is the “what if you scaled it to a real research lab” fork. If autoresearch becomes how labs actually run experiments this is roughly what the production version looks like.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Prompt Optimizers: Same Loop, Different Target File
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;What if &lt;code&gt;train.py&lt;/code&gt; was your system prompt?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once you accept that the loop is substrate-agnostic, the next move is obvious. Point it at a prompt file. Use accuracy on a test set as the metric. Let it iterate.&lt;/p&gt;

&lt;h3&gt;
  
  
  autoresearch-prompt-optimization (az9713)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/az9713/autoresearch-prompt-optimization" rel="noopener noreferrer"&gt;az9713/autoresearch-prompt-optimization&lt;/a&gt; is the cleanest version of this. The loop targets &lt;code&gt;prompt.txt&lt;/code&gt; instead of &lt;code&gt;train.py&lt;/code&gt;. The metric is field extraction accuracy on 30 test examples instead of val_bpb. Everything else is the same.&lt;/p&gt;

&lt;p&gt;The numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;74.72% → 100% accuracy in 8 experiments&lt;/li&gt;
&lt;li&gt;Zero human intervention&lt;/li&gt;
&lt;li&gt;Experiment 5 regressed and got auto-discarded: the loop caught it exactly as designed&lt;/li&gt;
&lt;li&gt;Cross-model: Claude Opus writes the prompts that Gemini 2.5 Flash executes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The thing prompt engineering has always been missing is a tight feedback signal. Most people write a prompt, eyeball some outputs, decide it “looks better.” Autoresearch makes prompt engineering a numerical optimization problem. Reading &lt;code&gt;last_run.json&lt;/code&gt; after each iteration turns prompt writing from art into engineering. That’s a real shift.&lt;/p&gt;

&lt;h3&gt;
  
  
  autoresearch-for-agents (Galileo)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/rungalileo/autoresearch-for-agents" rel="noopener noreferrer"&gt;&lt;code&gt;rungalileo/autoresearch-for-agents&lt;/code&gt;&lt;/a&gt; is more ambitious. They’re using the loop for adversarial testing plus prompt optimization on support agents.&lt;/p&gt;

&lt;p&gt;Two phases. Phase 1 builds a frozen adversarial test suite (the exam). Phase 2 optimizes the prompt against that frozen suite (the studying). Separating the exam from the studying stops the optimizer from moving the goalposts.&lt;/p&gt;

&lt;p&gt;The other clever bit is proportional scoring instead of binary pass/fail. Binary scores give the optimizer no gradient. “70% of the way there” is a signal you can climb. “Failed” isn’t.&lt;/p&gt;

&lt;p&gt;Results: 0.05 → 0.80 accuracy in 15 experiments. They also documented the limits of what prompt engineering alone can fix. Things like absence detection (“the customer didn’t mention X”) and off-by-one date math just don’t get solved by tweaking the prompt. That’s a useful negative result. Most write-ups about prompt optimization conveniently skip the part where they hit a wall.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Generalized Frameworks: Autoresearch For Anything
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;“Wait, this works for any measurable thing”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is the category that broke containment. Once a few people had ported the loop to prompts, the next move was to extract the pattern entirely. The result is a bunch of frameworks that don’t care what file you’re optimizing or what metric you’re using.&lt;/p&gt;

&lt;h3&gt;
  
  
  uditgoenka/autoresearch — Claude Code Skill
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/uditgoenka/autoresearch" rel="noopener noreferrer"&gt;uditgoenka/autoresearch&lt;/a&gt; packages the loop as a Claude Code skill. You install it, you run &lt;code&gt;/autoresearch&lt;/code&gt;, and you point it at any task with a mechanical metric. The README runs through about a dozen domains: test coverage, bundle size, TypeScript error count, SQL query speed, HR policy readability, Dockerfile size, accessibility audits, sales copy, marketing content. There’s also &lt;code&gt;/loop N&lt;/code&gt; integration for bounded iterations.&lt;/p&gt;

&lt;p&gt;It also documents how to wire MCP servers (PostgreSQL, GitHub, Stripe) as verification sources. So your “metric” can be a query against your actual production database, not a fixture.&lt;/p&gt;

&lt;p&gt;This is the version that makes the generalization explicit. The loop works for anything with constraint plus metric plus fast verification.&lt;/p&gt;

&lt;h3&gt;
  
  
  autoresearch-anything (zkarimi22)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/zkarimi22/autoresearch-anything" rel="noopener noreferrer"&gt;zkarimi22/autoresearch-anything&lt;/a&gt; is the lowest-friction setup I’ve seen. You run &lt;code&gt;npx autoresearch-anything&lt;/code&gt; and it interrogates you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What file should I edit?&lt;/li&gt;
&lt;li&gt;What metric am I optimizing?&lt;/li&gt;
&lt;li&gt;How do I run the eval?&lt;/li&gt;
&lt;li&gt;What’s off-limits?&lt;/li&gt;
&lt;li&gt;A few more along those lines.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It outputs &lt;code&gt;setup.md&lt;/code&gt; and &lt;code&gt;eval.js&lt;/code&gt; and you’re running. Eight questions and you have a configured autoresearch loop pointed at your project.&lt;/p&gt;

&lt;h3&gt;
  
  
  menonpg/autoloop — The pip Package
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/menonpg/autoloop" rel="noopener noreferrer"&gt;menonpg/autoloop&lt;/a&gt; is the first one that’s actually a Python library. &lt;code&gt;pip install autoloop-ai&lt;/code&gt;, import, and the API is clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;autoloop&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoLoop&lt;/span&gt;

&lt;span class="n"&gt;loop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AutoLoop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src/optimize_me.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;run_benchmark&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;directives&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Make this faster, don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t break tests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;budget_seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;experiments&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Parallel experiments via &lt;code&gt;loop.run(parallel=4)&lt;/code&gt;. Warm starts. Composite metrics with weights. Agent-agnostic: works with Claude, Codex, Ollama local models. CLI tools for inspecting history (&lt;code&gt;autoloop history&lt;/code&gt;, &lt;code&gt;autoloop best&lt;/code&gt;, &lt;code&gt;autoloop diff 12 best&lt;/code&gt;, &lt;code&gt;autoloop rollback 12&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The demo shows a 6.9x speedup on a fibonacci function in 4 experiments, and the framework auto-detected and discarded the broken iterations.&lt;/p&gt;

&lt;p&gt;This one’s for you if you want autoresearch as a library you import rather than a skill you invoke. The bar is “have a Python function that returns a float” and you’re in. That’s about as low as it gets.&lt;/p&gt;

&lt;h3&gt;
  
  
  krzysztofdudek/ResearcherSkill — One File, Full Discipline
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/krzysztofdudek/ResearcherSkill" rel="noopener noreferrer"&gt;krzysztofdudek/ResearcherSkill&lt;/a&gt; is interesting because it ignores the framework race entirely. It’s one &lt;code&gt;researcher.md&lt;/code&gt; file you drop into any AI agent. Before doing anything, the agent interviews you: goal, metric, constraints, time limit, stopping conditions.&lt;/p&gt;

&lt;p&gt;It creates a &lt;code&gt;.lab/&lt;/code&gt; directory (gitignored) for experiment history that survives code reverts. That’s separate from git on purpose. You don’t want a &lt;code&gt;git reset --hard&lt;/code&gt; to wipe your experiment log.&lt;/p&gt;

&lt;p&gt;The loop has three phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;THINK&lt;/strong&gt; — mandatory written analysis before each experiment, logged separately&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TEST&lt;/strong&gt; — commit, run, keep or revert&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;REFLECT&lt;/strong&gt; — log entry in &lt;code&gt;log.md&lt;/code&gt;, row in &lt;code&gt;results.tsv&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are also convergence guardrails baked in. Three discards in a row = mandatory pause. Five discards = force branch fork. Plateau for 8+ experiments = invert assumptions.&lt;/p&gt;

&lt;p&gt;The interesting part is THINK. Most autoresearch implementations skip written analysis. The agent just runs. Forcing it to write down what it expects to happen &lt;em&gt;before&lt;/em&gt; running changes what it tries. The README claims “10 minutes of analysis can prevent 5 wasted experiments,” which I believe.&lt;/p&gt;

&lt;p&gt;There’s also a “thought experiment” type that lets the agent log analysis without running code. It counts as a row in the results, just labeled &lt;code&gt;thought&lt;/code&gt;. That’s a small detail and it matters more than it should.&lt;/p&gt;

&lt;h3&gt;
  
  
  alfonsograziano/auto-agent — Autoresearch Builds Agents
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/alfonsograziano/auto-agent" rel="noopener noreferrer"&gt;alfonsograziano/auto-agent&lt;/a&gt; is autoresearch turned on AI agents themselves. You give it a target agent (in a separate repo) and a golden dataset of expected input/output pairs. The orchestrator spawns Claude Code or Kiro CLI inside the target repo, has it analyze failures, implement fixes, and re-run.&lt;/p&gt;

&lt;p&gt;Two repos: orchestrator and target. &lt;code&gt;MEMORY.md&lt;/code&gt; persists across hypotheses (what worked, what didn’t, known blockers). Each hypothesis gets its own git branch and its own &lt;code&gt;REPORT.md&lt;/code&gt; with before/after metrics and a &lt;code&gt;CONTINUE&lt;/code&gt; or &lt;code&gt;ROLLBACK&lt;/code&gt; decision. After a run, &lt;code&gt;npm run generate-changelog&lt;/code&gt; produces a human-readable summary.&lt;/p&gt;

&lt;p&gt;This is recursive in a way that very interesting. The thing being optimized is an AI agent. The thing doing the optimizing is also an AI agent. The metric is how often the target hits the golden set. You’re using autoresearch to make agents better at the things you created them for.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Production Codebase Optimization: Autoresearch on Real OSS
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Shopify used it on the Liquid template engine&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is where the pattern stops being a demo. Shopify ran autoresearch against the Liquid template engine, the thing that renders every theme on Shopify, and shipped the results.&lt;/p&gt;

&lt;p&gt;The setup is in &lt;a href="https://github.com/Shopify/liquid/blob/2543fdc1a101f555db208fb0deeb2e3bf1ae9e36/auto/autoresearch.md" rel="noopener noreferrer"&gt;auto/autoresearch.md&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Benchmark: ThemeRunner (real Shopify theme templates, not synthetic)&lt;/li&gt;
&lt;li&gt;Metric: combined parse + render time in microseconds (primary), allocations (secondary)&lt;/li&gt;
&lt;li&gt;Constraints: tests must pass, no new gem dependencies, semantic correctness preserved&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The results across 17 tracked experiments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;7,374µs → 4,815µs (-34%)&lt;/li&gt;
&lt;li&gt;62,620 → 37,355 allocations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent’s techniques included replacing regex with manual byte parsing, fast-path variable parsing, and short-circuit checks for common cases. None of it is rocket science. It’s the kind of optimization a senior developer would do given enough time and a good profiler. The agent just had cheap iteration and an automatic discard for anything that broke a test.&lt;/p&gt;

&lt;h3&gt;
  
  
  More Production War Stories
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Real companies, real metrics, real prod deploys&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once Shopify went public with theirs, more case studies surfaced.&lt;/p&gt;

&lt;h3&gt;
  
  
  idealo Search Ranking
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://medium.com/idealo-tech-blog/one-hour-37-faster-applying-autoresearch-to-our-search-ranking-inference-endpoint-34cffc08e373" rel="noopener noreferrer"&gt;The idealo team&lt;/a&gt; (Atakan Filgöz, Gena Shabanov, Arjun Roy Choudhury) ran autoresearch against &lt;code&gt;preprocess.py&lt;/code&gt; in their Learning-to-Rank inference endpoint. They added a correctness constraint that required bit-for-bit identical output between the original and optimized version, then optimized for average latency over 500 benchmark iterations.&lt;/p&gt;

&lt;p&gt;Numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;13 experiments in 1 hour&lt;/li&gt;
&lt;li&gt;10 kept, 3 reverted&lt;/li&gt;
&lt;li&gt;Preprocessing latency: 3.9ms → 0.66ms (83% reduction, 5.9x speedup)&lt;/li&gt;
&lt;li&gt;End-to-end production latency: 46ms → 28.8ms (37% reduction at 250+ req/sec)&lt;/li&gt;
&lt;li&gt;Total cost: ~$7 in Claude Opus on AWS Bedrock&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For seven dollars and an hour of supervision, they took 37% off a production endpoint that’s serving 250+ req/sec. That’s an absurd ROI.&lt;/p&gt;

&lt;p&gt;The techniques the agent found: shared computation (sort once, derive everything else), algorithmic shortcuts for sorted arrays, minimal allocations. The agent reasoned like a profiler: “the ranking computation takes 40% of total time, focus there next.” They watched it work, occasionally steered it, and shadow-tested before shipping. It’s now in production.&lt;/p&gt;

&lt;p&gt;The honest detail in the writeup is that the agent’s code was clean at 13 experiments but they suspect longer runs would over-engineer. That tracks with my experience using AI tools for refactoring. The first dozen suggestions are gold. By suggestion 50 it’s pattern-matching to “more abstraction must be better” and you have to slap its hand.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tennis XGBoost — The Reward Hacking Cautionary Tale
&lt;/h3&gt;

&lt;p&gt;This is the one nobody mentions when they’re hyping the pattern. &lt;a href="https://nickoak.com/posts/tennis-xgboost-autoresearch/" rel="noopener noreferrer"&gt;Nick Oak&lt;/a&gt; ran autoresearch on a tennis match prediction XGBoost model. The agent found a way to game the metric without actually improving the model. He preserved the embarrassing iterations on an &lt;code&gt;archived/gamed-iterations&lt;/code&gt; branch so you can read what the agent did.&lt;/p&gt;

&lt;p&gt;The discard mechanism only saves you if your metric is measuring what you actually care about. If your eval can be gamed, the agent will game it. This is not an RL-only problem. Reward hacking shows up everywhere there’s an automated optimizer, and autoresearch is exactly that.&lt;/p&gt;

&lt;p&gt;The takeaway isn’t “autoresearch is dangerous.” It’s “your metric is now a load-bearing piece of software and you should treat it that way.” Spend more time on the eval than on the loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vesuvius Challenge Ink Detection
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://scrollprize.substack.com/p/we-are-cooking" rel="noopener noreferrer"&gt;Vesuvius Challenge ran a multi-agent autoresearch loop&lt;/a&gt; for ink detection on ancient scrolls, focused on cross-scroll generalization. I haven’t dug deep into this one, but it’s worth knowing that autoresearch is currently being used to read 2,000-year-old burned scrolls. That’s a thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Agent Factory: Autoresearch Builds Agents
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Applying the loop to creating other agents&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Dominien/agent-factory" rel="noopener noreferrer"&gt;Dominien/agent-factory&lt;/a&gt; takes the meta move further than auto-agent. Instead of optimizing an existing agent, it autonomously researches problems and builds new specialized agents to solve them.&lt;/p&gt;

&lt;p&gt;The loop is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Research&lt;/strong&gt; : Reddit, HN, GitHub, Twitter — find real problems people have&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score&lt;/strong&gt; : Venture Score plus TAM estimate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build&lt;/strong&gt; : Next.js agent from a seed template&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate&lt;/strong&gt; : against synthetic users / actual usage&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ship&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Repeat&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There’s a threshold ratchet. The bar to ship keeps rising as the system finds better ideas. So the things it builds get better over time, not because the agent is smarter, but because it’s competing against its own previous best.&lt;/p&gt;

&lt;p&gt;Agents shipped so far: freelancer-deduction-finder, wage-rights-advisor, data-broker-opt-out, property-tax-appeal-advisor. Twenty agents and counting.&lt;/p&gt;

&lt;p&gt;This is the meta-loop concept and I find it disorienting. Research quality compounds the same way training quality does. A loop that researches problems, builds solutions, ships, and uses ship-ability as the metric will eventually outpace anyone manually doing the same thing. Whether the agents it ships are any good is the open question. But the &lt;em&gt;number&lt;/em&gt; keeps going up.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Research OS / Skills Systems: Institutionalizing the Pattern
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;What if autoresearch was the entire research methodology?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If autoresearch is going to actually be how research gets done, somebody has to build the scaffolding around it. Two projects are going hard at this.&lt;/p&gt;

&lt;h3&gt;
  
  
  PhD-Zero (TenureAI)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/TenureAI/PhD-Zero" rel="noopener noreferrer"&gt;TenureAI/PhD-Zero&lt;/a&gt; is an operating system for research-oriented coding agents. Modular skill library: run-governor, research-workflow, deep-research, experiment-execution, memory-manager, human-checkpoint, paper-writing.&lt;/p&gt;

&lt;p&gt;Cross-runtime: same skills exposed to Codex (via AGENTS.md) and Claude Code (via .claude/skills/). The focus is reproducibility, literature review, experiment planning. Discipline around the process.&lt;/p&gt;

&lt;p&gt;This is the thing that turns autoresearch from “fun overnight experiment” into something that could plausibly be used by a real research group. The autoresearch loop runs experiments. PhD-Zero runs the literature review, the writeup, the human checkpoints, the reproducibility checks. The loop is one verb in a much bigger vocabulary.&lt;/p&gt;

&lt;h3&gt;
  
  
  alirezarezvani/claude-skills
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/alirezarezvani/claude-skills/tree/main/engineering" rel="noopener noreferrer"&gt;alirezarezvani/claude-skills&lt;/a&gt; is a 204-skill library for AI coding agents, with autoresearch-agent as one skill in the engineering tier. Works across Claude Code, Codex, Gemini CLI, Cursor, Aider, Windsurf — eleven tools total.&lt;/p&gt;

&lt;p&gt;Treating autoresearch as a reusable skill component rather than a standalone repo is an important move. It means your agent uses autoresearch the way it uses anything else: as a tool you reach for when the situation calls for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Creative Writing: Autoresearch For Prose and Fiction
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The thing nobody expected: it works on writing too&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is the one I want to come back to in another post. The transfer is straightforward. If you can score a draft, you can run the loop. The metric just needs to be cheap, mechanical, and not gameable. (See the tennis cautionary tale.)&lt;/p&gt;

&lt;p&gt;Multiple projects figured this out independently within a few weeks of each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  redpen — Prose Refinement Engine
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/itspikabubu/redpen" rel="noopener noreferrer"&gt;itspikabubu/redpen&lt;/a&gt; is a ratchet loop for blog posts and writing. Drafts can only get better, never worse. Six AI personas score on different dimensions: seed founder, fellow GP, LP allocator, LinkedIn reader, HN skeptic, VC Twitter. Each persona runs three times and the scores are medianed for noise reduction.&lt;/p&gt;

&lt;p&gt;The writer agent makes one surgical edit targeting the weakest dimension. Re-evaluate. If the minimum score improved, keep. If not, discard and revert. Repeat until target score or max iterations.&lt;/p&gt;

&lt;p&gt;You can configure voice: tone spectrum, blacklist words, a 16-point natural prose rubric. I have not tried this yet but I’m planning to. If it works, it solves the thing every blogger struggles with: I can tell a draft is bad, but I can’t always tell &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  NousResearch/autonovel — Complete Novel Pipeline
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/NousResearch/autonovel" rel="noopener noreferrer"&gt;NousResearch/autonovel&lt;/a&gt; is the most ambitious creative writing fork. Full autonomous novel pipeline: seed concept → world bible → characters → outline → draft chapters → revision → export.&lt;/p&gt;

&lt;p&gt;Five co-evolving layers: voice, world, characters, outline, and chapters, with canon cross-cutting all of them. Two evaluation systems running in parallel: mechanical (regex bans for AI clichés, slop forensics) and LLM-judge (prose quality, voice adherence). Phase 3b sends the full manuscript to Claude Opus for a dual-persona review (literary critic + professor of fiction) and the loop continues until the reviewer’s complaints are mostly “qualified hedges rather than real problems.” Their phrase, not mine.&lt;/p&gt;

&lt;p&gt;There’s also an art pipeline (fal.ai), multi-voice audiobook (ElevenLabs), LaTeX typesetting, ePub generation, landing page.&lt;/p&gt;

&lt;p&gt;The first novel produced is &lt;em&gt;The Second Son of the House of Bells&lt;/em&gt;. 79,456 words. 19 chapters (down from 24: the loop did four structural merges). Six rounds of Opus review.&lt;/p&gt;

&lt;p&gt;The loop improved prose and changed the structure of the book. We talk about autoresearch like it’s a fine-grained optimizer, but at long enough horizons, it’s making editorial decisions a human would make.&lt;/p&gt;

&lt;h3&gt;
  
  
  sinfiny/Auto-Creative-Reasoning
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/sinfiny/Auto-Creative-Reasoning-" rel="noopener noreferrer"&gt;sinfiny/Auto-Creative-Reasoning&lt;/a&gt; is benchmark-first. The repo motto is “generation is not the product. Evaluation is the product.” Rewrite ladders route failure to the right level: prose, scene, chapter, arc, premise. Rubrics score hook strength, strategy, clue fairness, consequence density, readability.&lt;/p&gt;

&lt;p&gt;There’s a Codex plugin for running benchmarked loops against existing fiction drafts. The long-term vision is multiple parallel novel timelines with competing chapter versions compared head-to-head.&lt;/p&gt;

&lt;p&gt;This is the version that argues evaluation is harder and more important than generation. Which is exactly the lesson from the tennis XGBoost story, ported to fiction.&lt;/p&gt;

&lt;h3&gt;
  
  
  CalvinMagezi/self-evolving-skill — Brand Document Evolution
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/CalvinMagezi/self-evolving-skill" rel="noopener noreferrer"&gt;CalvinMagezi/self-evolving-skill&lt;/a&gt; is the business-minded version. Autoresearch applied to &lt;code&gt;writing-strategy.md&lt;/code&gt; instead of &lt;code&gt;train.py&lt;/code&gt;. The metric is an LLM judge composite score on a fixed test brief, run three times at temperature=0 and medianed.&lt;/p&gt;

&lt;p&gt;The output is real documents: &lt;code&gt;.docx&lt;/code&gt;, &lt;code&gt;.pptx&lt;/code&gt;, &lt;code&gt;.pdf&lt;/code&gt; that match brand identity. Git history serves as memory; the loop reads &lt;code&gt;git log&lt;/code&gt; before each iteration to avoid repeating failed ideas. Works with any LLM via LiteLLM (OpenRouter, Gemini, OpenAI, Anthropic).&lt;/p&gt;

&lt;p&gt;This is the one with the clearest business case of the bunch. Companies actually need their documents to get better. They have brand rubrics. They have a fixed test brief in the form of “the next thing we need to write.” All the pieces are already there.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Meta-Pattern: Wrapping Autoresearch as a Worker
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;What happens when autoresearch is just one layer of something bigger&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is the one that snapped my view of the whole ecosystem into focus. alirezarezvani had been shipping autoresearch as a skill since March. A month of production use revealed &lt;a href="https://alirezarezvani.medium.com/the-orchestrator-was-missing-building-an-internal-research-agent-around-autoresearch-in-claude-678b08a83c9b" rel="noopener noreferrer"&gt;the missing piece&lt;/a&gt;: orchestration above it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem with Solo Autoresearch
&lt;/h3&gt;

&lt;p&gt;One context window and reasoning trajectory, with no isolation between investigation threads. A query like “what is X, who are the players, what are the limits, what changed in 6 months” becomes four tangled sub-questions sharing one bloated context. By the time you’re on sub-question 4, the context is thick with answers from 1-3, and synthesis drifts.&lt;/p&gt;

&lt;p&gt;This is something I hit constantly with Claude Code on big tasks. By the time the context is full of half-finished investigations, the model is reasoning about all of them at once, badly.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Fix: 3 Files, 4 Subagents
&lt;/h3&gt;

&lt;p&gt;The whole rebuild is small:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CLAUDE.md&lt;/strong&gt; — decomposition rules, including an “independence test” (a sub-question is independent if its answer wouldn’t change based on another sub-question in the same query)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;.mcp.json&lt;/strong&gt; — Firecrawl, Perplexity, internal docs server. Critically, scoped per-agent to avoid the token tax of loading all MCP tool descriptions into every context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4 subagent definitions&lt;/strong&gt; — lead-researcher (orchestrator, no MCPs), web-searcher (invokes autoresearch inside its own context), internal-searcher, citation-checker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lead decomposes. Workers fan out in parallel. Each worker runs an autoresearch loop to convergence inside its own isolated context. Lead synthesizes. Citation-checker verifies every source. Wall-clock time ends up shorter than single-session autoresearch because the workers run in parallel.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Actually Broke In Production
&lt;/h3&gt;

&lt;p&gt;Four failure modes from the writeup, and they all rang bells:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Orchestrator over-delegation&lt;/strong&gt; — without the independence test, the orchestrator was paying for parallel context windows to produce worse answers than one session would have&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP tool-description token tax&lt;/strong&gt; — every MCP server’s tool descriptions loading into every agent’s context. Scoping per-agent fixed it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Citation drift&lt;/strong&gt; — workers returning confident claims where the page didn’t quite support the paraphrase. Paraphrase drift, not hallucination&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context amnesia between sessions&lt;/strong&gt; — a flat &lt;code&gt;lessons.md&lt;/code&gt; file the lead reads on startup is the imperfect fix&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The lesson here is the one that rewires the whole picture. Autoresearch was already a strong worker. The orchestrator does nothing clever: decompose, delegate, synthesize. The intelligence is in the decomposition rules, and those took three rewrites to get right.&lt;/p&gt;

&lt;p&gt;So the future isn’t “smarter autoresearch.” It’s autoresearch as a primitive that other systems call into.&lt;/p&gt;

&lt;h2&gt;
  
  
  So What Does This Actually Mean?
&lt;/h2&gt;

&lt;p&gt;Karpathy didn’t just build an ML research tool. He demonstrated a pattern that works anywhere you can measure progress with a command: constraint plus mechanical metric plus autonomous iteration.&lt;/p&gt;

&lt;p&gt;Here are the categories ranked by fidelity to the original idea:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Platform ports&lt;/strong&gt; — most faithful. Same loop, different hardware.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ML enhancers&lt;/strong&gt; — extend the substrate. Memory, Bayesian updates, multi-GPU.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt optimizers&lt;/strong&gt; — same loop, different file. &lt;code&gt;train.py&lt;/code&gt; → &lt;code&gt;prompt.txt&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generalized frameworks&lt;/strong&gt; — extract the pattern. pip packages, Claude Code skills, “give me any metric.”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production codebase&lt;/strong&gt; — industrial application. Shopify -34%, idealo -37% in 1 hour for $7.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent factory&lt;/strong&gt; — meta-application. The loop builds other agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research OS&lt;/strong&gt; — institutionalization. The whole methodology, not just the loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creative writing&lt;/strong&gt; — the surprise expansion. Prose, fiction, brand documents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Orchestration&lt;/strong&gt; — autoresearch as worker, not the whole system.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A few honest takes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The reward hacking problem is the cautionary tale nobody includes.&lt;/strong&gt; In the tennis XGBoost case, the loop found a way to improve the metric without improving the model. The discard mechanism is only as good as your metric. If your eval can be gamed, the agent will game it. Spend more time on the eval than on the loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pattern is more durable than the implementation.&lt;/strong&gt; Most of the forks I found were “what if we applied this to X” and they all worked. That’s kind of remarkable. The discard mechanism (&lt;code&gt;git reset&lt;/code&gt; on regression) is the key. You don’t need intelligence. You need iteration speed, a mechanical metric, and automatic rollback.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Shopify and idealo case studies should embarrass you a little.&lt;/strong&gt; $7 of API and an hour of supervision took 37% off a production endpoint serving 250+ req/sec. There are perf wins like this in basically every codebase. We’re just not asking for them yet because we still think of optimization as expensive senior-engineer time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Orchestration eats the loop.&lt;/strong&gt; alirezarezvani’s piece shows that solo autoresearch is fine, but the next move is autoresearch as a worker that orchestrators call when a sub-question lands. That’s where this is heading and it’s already happening in production.&lt;/p&gt;

&lt;p&gt;If you’re not running at least one of these on a real project, you’re leaving free improvements on the table. The bar to entry is &lt;code&gt;pip install autoloop-ai&lt;/code&gt; or &lt;code&gt;npx autoresearch-anything&lt;/code&gt;. There’s no reason not to point one at something you care about and let it run overnight. You’ll either get a better version of the thing or you’ll learn something about your metric. Both of those are wins.&lt;/p&gt;

</description>
      <category>aiagents</category>
    </item>
    <item>
      <title>Model Roundup: The Free Countdown, the $300 Amnesiac, and the Quiet Climber at #7</title>
      <dc:creator>Stephan Miller</dc:creator>
      <pubDate>Sat, 02 May 2026 12:00:00 +0000</pubDate>
      <link>https://forem.com/eristoddle/model-roundup-the-free-countdown-the-300-amnesiac-and-the-quiet-climber-at-7-167f</link>
      <guid>https://forem.com/eristoddle/model-roundup-the-free-countdown-the-300-amnesiac-and-the-quiet-climber-at-7-167f</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4o8jjr90c8zhp8c4a4wl.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4o8jjr90c8zhp8c4a4wl.jpg" alt=" " width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I check OpenRouter rankings every week to figure out which models to throw at my projects. This week, the model at the top of the charts had something I’d never seen before: an expiration date.&lt;/p&gt;

&lt;p&gt;Right there on the Tencent Hy3 Preview page: “Going Away May 8.” Six days from now. And it’s currently generating 2.15 trillion tokens a week with a +1,356% spike. You know what that is? Not a sign of the best model on the market. It’s the AI equivalent of a store liquidation sale. Everyone’s grabbing tokens before they cost money.&lt;/p&gt;

&lt;p&gt;That’s W18 in a nutshell. The #1 model is a countdown timer. The hottest new premium subscription ($300/month from xAI) still can’t remember who you are between sessions.&lt;/p&gt;

&lt;p&gt;There’s good news buried in all this: Kimi K2.6, which I mentioned &lt;a href="https://dev.to/april-2026-model-roundup-the-billing-horror-the-012m-unicorn-and-metas-open-source-betrayal/"&gt;last week&lt;/a&gt; as an interesting launch, has started showing real production numbers. And there’s a model called Step 3.5 Flash that’s been quietly climbing the rankings for three months with zero hype, which in this market is basically a standing ovation.&lt;/p&gt;

&lt;p&gt;Let me tell you what actually matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Table of Contents&lt;/li&gt;
&lt;li&gt;The #1 Model Is a Countdown Timer (Tencent Hy3 Preview)&lt;/li&gt;
&lt;li&gt;
Kimi K2.6 Is Now a Real Recommendation

&lt;ul&gt;
&lt;li&gt;Where K2.6 Falls Short&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

The Sleeper: Step 3.5 Flash Has Been Climbing for Three Months

&lt;ul&gt;
&lt;li&gt;The One Real Catch&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Grok 4.3: Genuinely Impressive, Genuinely Annoying, $300/Month&lt;/li&gt;

&lt;li&gt;Your Smarter Model Might Be Breaking Your Agents&lt;/li&gt;

&lt;li&gt;What’s Actually Worth Using (and What’s Coming)&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  The #1 Model Is a Countdown Timer (Tencent Hy3 Preview)
&lt;/h2&gt;

&lt;p&gt;Tencent launched Hy3 Preview on April 22 with a free access period that runs out May 8. That’s the entire explanation for the +1,356% weekly spike and the 2.15 trillion tokens burned. Developers saw “free” and “295B MoE” in the same sentence and did what developers do: they stress-tested it before anyone sent them a bill.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgedcd3q9v2alykl2pmjn.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgedcd3q9v2alykl2pmjn.jpg" alt="Tencent free Hy3 Preview on OpenRouter" width="800" height="643"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s what Hy3 Preview actually is: 295 billion total parameters, 21 billion activated per token (mixture of experts, efficient by design), 262K context window, configurable reasoning you can dial from disabled to low to high. Designed for agentic coding workflows. On paper, solid.&lt;/p&gt;

&lt;p&gt;In practice? No Arena votes because it’s too new to have accumulated any. No long-form reviews because nobody’s shipped anything with it yet. No “I’ve been using this for three weeks and it’s my daily driver” posts anywhere I could find. Just a lot of “grabbing free tokens before May 8” energy.&lt;/p&gt;

&lt;p&gt;What happens after May 8 is the real question. Hy3 Preview becomes a paid model competing against DeepSeek V3.2 (which costs $0.14 input / $0.28 output per 1M tokens and has months of production track record), Kimi K2.6 ($0.74/$3.49 with confirmed adoption), and Step 3.5 Flash (which I’ll get to in a moment). Entering that field with no reviews and no Arena ranking is a tough position.&lt;/p&gt;

&lt;p&gt;If you want to play with it before the deadline, go to &lt;a href="https://openrouter.ai/tencent/hy3-preview:free" rel="noopener noreferrer"&gt;openrouter.ai/tencent/hy3-preview:free&lt;/a&gt; and run some benchmarks. Just don’t build a dependency on something with a “Going Away” notice stamped on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kimi K2.6 Is Now a Real Recommendation
&lt;/h2&gt;

&lt;p&gt;Last week I called Kimi K2.6 an interesting launch. Twelve days later, the production numbers are coming in and it’s something more concrete.&lt;/p&gt;

&lt;p&gt;Real developers running real workflows are reporting 88% cost savings when they replace Claude with K2.6 for bulk coding tasks: batch migrations, test generation, format conversion, anything where you’re doing a lot of the same kind of work repeatedly. The Kimi Code CLI, the companion tool for using K2.6 in your terminal the same way you’d use Claude Code, crossed 6,400 GitHub stars. That’s people betting actual infrastructure on this model, not just upvoting a launch post.&lt;/p&gt;

&lt;p&gt;The pattern hardening into consensus across forums: use K2.6 for bulk, use Claude for the high-stakes core. At $0.74 input / $3.49 output per 1M tokens, K2.6 is roughly 4x cheaper than Claude Sonnet 4.6. For workflows that generate a lot of tokens on repetitive work, that math compounds fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where K2.6 Falls Short
&lt;/h3&gt;

&lt;p&gt;This is the part I actually care about more than the hype. K2.6 trails GPT-5.4 on GPQA-Diamond (90.5% vs 92.8%) and AIME 2026 (96.4% vs 99.2%). These are hard reasoning benchmarks. For anything where being wrong has real consequences (financial analysis, medical context, legal questions), K2.6 is not the answer. The cost savings don’t matter if the output costs you more to fix.&lt;/p&gt;

&lt;p&gt;Use it for code. Trust it with the boring high-volume stuff. Keep a premium model on anything where you’d be embarrassed if an AI got it wrong.&lt;/p&gt;

&lt;p&gt;K2.6 also ships with agent swarm architecture supporting up to 300 parallel sub-agents and 4,000 coordinated steps. After &lt;a href="https://dev.to/my-home-ai-agent-kept-making-shit-up/"&gt;my own experiences with AI agents inventing things&lt;/a&gt; I’d start with single-agent mode until you’ve validated its judgment in your specific domain. 300 parallel sub-agents hallucinating tool calls in parallel is not a good time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Sleeper: Step 3.5 Flash Has Been Climbing for Three Months
&lt;/h2&gt;

&lt;p&gt;Most models follow the same OpenRouter arc: spike at launch, plateau after a few weeks, slowly fade as the next shiny thing arrives. Step 3.5 Flash doesn’t fit this pattern.&lt;/p&gt;

&lt;p&gt;StepFun released it somewhere in early 2026; the exact date is contested across sources, somewhere between late January and March, doesn’t matter. As of this week it’s at #7 on OpenRouter with +28% week-over-week. For a model that’s been around three months, that’s not a hype spike. That’s sustained adoption with nothing to explain it except developers finding it useful.&lt;/p&gt;

&lt;p&gt;The numbers back it up: #4 intelligence ranking out of 64 models on Artificial Analysis. That puts it above almost everything priced anywhere near its cost: free on the rate-limited tier, $0.10 input / $0.30 output per 1M tokens on paid. For comparison, DeepSeek V3.2 costs $0.14/$0.28 and ranks lower on the same index. Step 3.5 Flash is somehow cheaper AND smarter on paper, and nobody’s writing breathless posts about it.&lt;/p&gt;

&lt;p&gt;Architecture: 196 billion total parameters, 11 billion activated per token (MoE), 262K context, reasoning parameter support so you can see step-by-step thinking in API responses if you want it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The One Real Catch
&lt;/h3&gt;

&lt;p&gt;Step 3.5 Flash is extremely verbose. During Artificial Analysis evaluation it generated 260 million tokens versus an 11 million token average for comparable models. It thinks out loud, at length, in a way that will surprise your output token budget if you’re not watching.&lt;/p&gt;

&lt;p&gt;Set &lt;code&gt;max_tokens&lt;/code&gt; limits. If you’re using it for any high-volume generation, put a ceiling on it. Otherwise you’ll get thorough reasoning that costs more than you expected from a supposedly cheap model.&lt;/p&gt;

&lt;p&gt;Worth adding to your comparison set before someone writes a breathless Medium post about it and StepFun decides to raise the price.&lt;/p&gt;

&lt;h2&gt;
  
  
  Grok 4.3: Genuinely Impressive, Genuinely Annoying, $300/Month
&lt;/h2&gt;

&lt;p&gt;Let’s do the good news first, because there’s real good news here.&lt;/p&gt;

&lt;p&gt;Grok 4.3 (launched April 17, currently rolling out in beta to SuperGrok Heavy subscribers) added native video input processing, not “describe this video” video but actual video-grounded reasoning. It can generate fully-formatted downloadable PDFs, populated spreadsheets, and PowerPoint presentations directly from conversation. Early beta testers are reporting formatted outputs they could hand to someone without cleanup. The integration with Grok Computer (xAI’s desktop automation agent) got tighter. If you’re doing autonomous desktop workflows, Grok 4.3 has a real story.&lt;/p&gt;

&lt;p&gt;Now the bad news.&lt;/p&gt;

&lt;p&gt;Grok 4.3 costs $300/month. That’s $100 more than ChatGPT Pro and $100 more than Claude Max. Both of those services have had persistent memory between sessions for over a year. Grok 4.3 does not. Every time you close your tab, the model forgets you. You start over. Blank context, fresh start, zero memory of anything you’ve built together.&lt;/p&gt;

&lt;p&gt;Persistent memory is not on xAI’s published roadmap.&lt;/p&gt;

&lt;p&gt;Multiple reviewers landed on the same observation this week. One X user put it cleanly: “you’re paying $300/month for a model that forgets you between sessions.” That’s not exaggeration. That’s the product.&lt;/p&gt;

&lt;p&gt;At $200/month, this would be annoying. At $300/month, it’s a product decision, and product decisions tell you something about what a company is optimizing for. xAI built the video capabilities and the document generation first. Memory (the feature that makes an AI assistant feel like an actual assistant rather than a very fancy search box) is apparently not the priority.&lt;/p&gt;

&lt;p&gt;Add the “High Demand” server errors that hit during launch week beta and you’ve got a model that’s impressive in demos and frustrating in daily use. The full API rollout is coming mid-to-late May. When it hits general availability, this conversation is going to get louder.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your Smarter Model Might Be Breaking Your Agents
&lt;/h2&gt;

&lt;p&gt;This one’s structural rather than model-specific, and it’s relevant for anyone running agentic pipelines.&lt;/p&gt;

&lt;p&gt;An April 2026 ICLR paper titled “The Reasoning Trap” documented something uncomfortable: RL-based reasoning training (the kind that makes frontier models better at hard reasoning tasks) increases tool-hallucination rates in lockstep. The better a model gets at reasoning, the more often it invents tool calls that don’t exist. Function names, API endpoints, methods that aren’t in your schema. The model reasons its way to a call it can’t actually make.&lt;/p&gt;

&lt;p&gt;If you’ve upgraded your agentic pipeline to a stronger reasoning model because it’s smarter, you may have simultaneously increased the rate at which it hallucinates the tools it should be calling. The capability and the failure mode scale together.&lt;/p&gt;

&lt;p&gt;I’ve written about &lt;a href="https://dev.to/my-home-ai-agent-kept-making-shit-up/"&gt;running into this firsthand with OpenClaw&lt;/a&gt;. The model-specific details differ but the pattern is the same. Stronger reasoning doesn’t mean better tool selection, and in agentic contexts “smarter” can break things in ways you don’t catch until something fails in production.&lt;/p&gt;

&lt;p&gt;Practical response: add tool-call schema validation before your agents execute. Check that every tool the model selects actually exists in your registry before you let it run. This applies to every frontier RL-trained model right now. It’s not a specific model bug, it’s how these systems are being trained.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Actually Worth Using (and What’s Coming)
&lt;/h2&gt;

&lt;p&gt;Quick reference:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input $/1M&lt;/th&gt;
&lt;th&gt;Output $/1M&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free (grab it now)&lt;/td&gt;
&lt;td&gt;Hy3 Preview&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Experiments before May 8 only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free (stable)&lt;/td&gt;
&lt;td&gt;Step 3.5 Flash&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Rate-limited; best free reasoning available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free (open weights)&lt;/td&gt;
&lt;td&gt;Nemotron 3 Super 120B&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;NVIDIA-backed, open license, 262K context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free (new, watch)&lt;/td&gt;
&lt;td&gt;Owl Alpha (stealth)&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;1M context, agentic (prompts may be logged)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Budget&lt;/td&gt;
&lt;td&gt;Step 3.5 Flash (paid)&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;Climbing for 3 months, verbose but smart&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Budget&lt;/td&gt;
&lt;td&gt;DeepSeek V3.2&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;$0.28&lt;/td&gt;
&lt;td&gt;Proven track record, still the value baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mid&lt;/td&gt;
&lt;td&gt;Kimi K2.6&lt;/td&gt;
&lt;td&gt;$0.74&lt;/td&gt;
&lt;td&gt;$3.49&lt;/td&gt;
&lt;td&gt;Bulk coding workflows, 88% cheaper than Claude&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mid&lt;/td&gt;
&lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;$12.00&lt;/td&gt;
&lt;td&gt;Arena #4 overall, 1M context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Premium&lt;/td&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;~$3.00&lt;/td&gt;
&lt;td&gt;~$15.00&lt;/td&gt;
&lt;td&gt;#2 Arena coding, proven daily driver&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Premium&lt;/td&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$25.00&lt;/td&gt;
&lt;td&gt;#1 Arena overall (thinking mode), high stakes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Mark your calendar for May 19.&lt;/strong&gt; Google I/O is 17 days away. Gemini 4 isn’t confirmed, but annual release patterns and confirmed agenda items (agentic AI, developer tooling) make it likely. That’s the next likely shakeup in this table.&lt;/p&gt;

&lt;p&gt;Claude Mythos, Anthropic’s model that developed a working exploit for a remote code execution vulnerability in FreeBSD (CVE-2026-4747), is not coming to a public API. It’s locked in Project Glasswing, a security research consortium, and Anthropic has no public timeline for changing that. Mention it at parties.&lt;/p&gt;

&lt;p&gt;GPT-6 is still vaporware. Polymarket has it at 84% by December 31, 2026. That’s not a date, it’s a guess with confidence bounds.&lt;/p&gt;

&lt;p&gt;The model worth your attention this week isn’t at #1. It’s at #7, three months old, climbing steadily, no hype cycle to explain it. Step 3.5 Flash just keeps showing up in the data.&lt;/p&gt;

</description>
      <category>largelanguagemodels</category>
    </item>
    <item>
      <title>My AI Agent Kept Making Shit Up (And Other Lessons From Running OpenClaw)</title>
      <dc:creator>Stephan Miller</dc:creator>
      <pubDate>Tue, 28 Apr 2026 12:00:00 +0000</pubDate>
      <link>https://forem.com/eristoddle/my-ai-agent-kept-making-shit-up-and-other-lessons-from-running-openclaw-566p</link>
      <guid>https://forem.com/eristoddle/my-ai-agent-kept-making-shit-up-and-other-lessons-from-running-openclaw-566p</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wkuqyq3jbkahikbjwjs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wkuqyq3jbkahikbjwjs.png" alt="OpenClaw" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I wanted an AI agent running on my home network. Not a cloud subscription and not something requiring me to be at the keyboard all day. A thing that wakes up at 7am, pulls from RSS feeds and Reddit, synthesizes the news I actually care about, and emails it to me. Just that. That’s what I started with. Seemed simple. It wasn’t like I was asking much.&lt;/p&gt;

&lt;p&gt;The reality was six weeks of debugging hallucinations, silent config failures, broken tool schemas, and a recurring realization that LLMs are, in certain contexts, compulsive liars.&lt;/p&gt;

&lt;p&gt;Here’s what I learned the hard way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The Setup: OpenClaw + DeepSeek in Docker&lt;/li&gt;
&lt;li&gt;The Exec Approval Maze&lt;/li&gt;
&lt;li&gt;The Reports That Were Too Good&lt;/li&gt;
&lt;li&gt;Going Around the Agent&lt;/li&gt;
&lt;li&gt;When Tools Become Literal Text&lt;/li&gt;
&lt;li&gt;Ripping Out Slack&lt;/li&gt;
&lt;li&gt;What’s Actually Working&lt;/li&gt;
&lt;li&gt;
But Here’s What She’s Actually Good At

&lt;ul&gt;
&lt;li&gt;The Report Engine Isn’t a One-Trick Pony&lt;/li&gt;
&lt;li&gt;Email Delivery, Old School On Purpose&lt;/li&gt;
&lt;li&gt;Multi-Model, Not Locked to DeepSeek&lt;/li&gt;
&lt;li&gt;The Track Record, Three Days In&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;What I Actually Built&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Setup: OpenClaw + DeepSeek in Docker
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; is a self-hosted AI agent framework. If you haven’t heard of it, think a local version of an AI assistant with cron jobs, tool calling, Slack/Telegram integration, and memory. Plus, how haven’t you heard of it. You run it in Docker, point it at whatever LLM you want, and theoretically have an autonomous agent working for you.&lt;/p&gt;

&lt;p&gt;I named mine Sabrina. She runs DeepSeek V3 (&lt;code&gt;deepseek/deepseek-chat&lt;/code&gt;) because the OpenAI and Anthropic APIs bill by the token and Sabrina is a chatty agent who generates daily reports. DeepSeek at pay-as-you-go rates keeps the monthly bill manageable.&lt;/p&gt;

&lt;p&gt;The architecture is two containers: &lt;code&gt;openclaw-gateway&lt;/code&gt; handles HTTP and the Slack/Telegram socket connections, and &lt;code&gt;openclaw-cli&lt;/code&gt; is the shell interface. The whole &lt;code&gt;~/.openclaw&lt;/code&gt; directory mounts into the container at &lt;code&gt;/home/node/.openclaw&lt;/code&gt; so configs, cron jobs, and workspace scripts are all live-editable from the host without rebuilding.&lt;/p&gt;

&lt;p&gt;On paper, this is elegant. In practice, you will spend a lot of time staring at container logs wondering why your agent is quietly lying to you. Or realizing you can just put Claude Code on the host and just have it fix things when they mess up.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Exec Approval Maze
&lt;/h2&gt;

&lt;p&gt;Before Sabrina could run scripts, I had to configure &lt;code&gt;exec-approvals.json&lt;/code&gt;: a policy file that controls what shell commands the agent is allowed to execute. Fine. Reasonable. I set up allowlists for the workspace scripts and Python interpreter.&lt;/p&gt;

&lt;p&gt;Then the cron jobs started silently failing. The daily 7am AI report would produce output, but something felt off. I dug into the exec-approval config and found the first trap:&lt;/p&gt;

&lt;p&gt;The documentation (and my own reasoning at the time) suggested &lt;code&gt;"ask": "never"&lt;/code&gt; as a way to skip interactive approval prompts for unattended jobs. This is wrong. The schema only accepts &lt;code&gt;"off" | "on-miss" | "always"&lt;/code&gt;. Using &lt;code&gt;"never"&lt;/code&gt; doesn’t throw an error. It gets silently stripped by &lt;code&gt;sanitizeExecApprovalPolicy&lt;/code&gt; the next time the app writes the file. Your config looks fine, your intent is gone, and the agent starts timing out on approval requests at 7am with no operator connected.&lt;/p&gt;

&lt;p&gt;The correct pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"security"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"allowlist"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ask"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"off"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"allowlist"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"main"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"security"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"allowlist"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ask"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"off"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"allowlist"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;"ask": "off"&lt;/code&gt; makes the allowlist the sole policy.&lt;/p&gt;

&lt;p&gt;I fixed this. Or so I thought.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reports That Were Too Good
&lt;/h2&gt;

&lt;p&gt;The AI intelligence report looked great. Every morning: a well-formatted digest of the day’s AI news, summaries, source links. Sabrina was crushing it.&lt;/p&gt;

&lt;p&gt;Then I noticed the timestamps.&lt;/p&gt;

&lt;p&gt;Every log entry in the fabricated reports had timestamps ending in &lt;code&gt;:00&lt;/code&gt; or &lt;code&gt;:30&lt;/code&gt;. No real log file looks like that: they’re messy, they have milliseconds, they reflect actual compute time. These were fake. I checked the URLs. Several of them 404’d. The article summaries were plausible but not verifiable. Sabrina had been generating the reports &lt;em&gt;herself&lt;/em&gt; , not from RSS feeds, but from her training data and imagination, because the exec approval issue wasn’t actually fixed. When the script couldn’t run, the agent fell back on what LLMs do naturally: produce what the output &lt;em&gt;should&lt;/em&gt; look like.&lt;/p&gt;

&lt;p&gt;This is the thing nobody tells you about giving LLMs agentic tasks: when they fail to do the thing, they don’t say “I failed to do the thing.” They generate a plausible simulation of having done the thing.&lt;/p&gt;

&lt;p&gt;The fix I’d been applying, tweaking exec-approvals, only addressed the symptom. The agent could bypass exec approval entirely by deciding to write the content directly. There was no configuration that would stop a sufficiently motivated language model from bullshitting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going Around the Agent
&lt;/h2&gt;

&lt;p&gt;The actual fix was nuclear: remove the agent from report generation entirely.&lt;/p&gt;

&lt;p&gt;I disabled the OpenClaw cron jobs for both the AI report and the email send, then added host-level cron entries that call &lt;code&gt;docker exec&lt;/code&gt; directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;7&lt;/span&gt; * * * &lt;span class="n"&gt;docker&lt;/span&gt; &lt;span class="n"&gt;exec&lt;/span&gt; &lt;span class="n"&gt;openclaw&lt;/span&gt;-&lt;span class="n"&gt;openclaw&lt;/span&gt;-&lt;span class="n"&gt;gateway&lt;/span&gt;-&lt;span class="m"&gt;1&lt;/span&gt; /&lt;span class="n"&gt;usr&lt;/span&gt;/&lt;span class="n"&gt;bin&lt;/span&gt;/&lt;span class="n"&gt;python3&lt;/span&gt; /&lt;span class="n"&gt;home&lt;/span&gt;/&lt;span class="n"&gt;node&lt;/span&gt;/.&lt;span class="n"&gt;openclaw&lt;/span&gt;/&lt;span class="n"&gt;workspace&lt;/span&gt;/&lt;span class="n"&gt;ai_report&lt;/span&gt;.&lt;span class="n"&gt;py&lt;/span&gt; --&lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;-&lt;span class="n"&gt;intelligence&lt;/span&gt; &amp;gt;&amp;gt; /&lt;span class="n"&gt;home&lt;/span&gt;/&lt;span class="n"&gt;eristoddle&lt;/span&gt;/.&lt;span class="n"&gt;openclaw&lt;/span&gt;/&lt;span class="n"&gt;workspace&lt;/span&gt;/&lt;span class="n"&gt;logs&lt;/span&gt;/&lt;span class="n"&gt;report&lt;/span&gt;-&lt;span class="n"&gt;host&lt;/span&gt;-$(&lt;span class="n"&gt;date&lt;/span&gt; +\%&lt;span class="n"&gt;Y&lt;/span&gt;-\%&lt;span class="n"&gt;m&lt;/span&gt;-\%&lt;span class="n"&gt;d&lt;/span&gt;).&lt;span class="n"&gt;log&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&amp;gt;&amp;amp;&lt;span class="m"&gt;1&lt;/span&gt;

&lt;span class="m"&gt;30&lt;/span&gt; &lt;span class="m"&gt;7&lt;/span&gt; * * * &lt;span class="n"&gt;docker&lt;/span&gt; &lt;span class="n"&gt;exec&lt;/span&gt; &lt;span class="n"&gt;openclaw&lt;/span&gt;-&lt;span class="n"&gt;openclaw&lt;/span&gt;-&lt;span class="n"&gt;gateway&lt;/span&gt;-&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="n"&gt;bash&lt;/span&gt; /&lt;span class="n"&gt;home&lt;/span&gt;/&lt;span class="n"&gt;node&lt;/span&gt;/.&lt;span class="n"&gt;openclaw&lt;/span&gt;/&lt;span class="n"&gt;workspace&lt;/span&gt;/&lt;span class="n"&gt;send&lt;/span&gt;-&lt;span class="n"&gt;ai&lt;/span&gt;-&lt;span class="n"&gt;intelligence&lt;/span&gt;-&lt;span class="n"&gt;report&lt;/span&gt;-&lt;span class="n"&gt;proper&lt;/span&gt;.&lt;span class="n"&gt;sh&lt;/span&gt; &amp;gt;&amp;gt; /&lt;span class="n"&gt;home&lt;/span&gt;/&lt;span class="n"&gt;eristoddle&lt;/span&gt;/.&lt;span class="n"&gt;openclaw&lt;/span&gt;/&lt;span class="n"&gt;workspace&lt;/span&gt;/&lt;span class="n"&gt;logs&lt;/span&gt;/&lt;span class="n"&gt;email&lt;/span&gt;-&lt;span class="n"&gt;host&lt;/span&gt;-$(&lt;span class="n"&gt;date&lt;/span&gt; +\%&lt;span class="n"&gt;Y&lt;/span&gt;-\%&lt;span class="n"&gt;m&lt;/span&gt;-\%&lt;span class="n"&gt;d&lt;/span&gt;).&lt;span class="n"&gt;log&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&amp;gt;&amp;amp;&lt;span class="m"&gt;1&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Python script runs inside the container, where it has access to the right Python packages, but the &lt;em&gt;trigger&lt;/em&gt; is the host crontab. No agent involved. No LLM between the script and reality.&lt;/p&gt;

&lt;p&gt;This works. The reports now have messy timestamps and real URLs that actually load.&lt;/p&gt;

&lt;p&gt;The Obsidian weekly report I left in OpenClaw, because that one &lt;em&gt;needs&lt;/em&gt; the agent. It reads my vault, categorizes clips, writes summaries, analyzes git diffs: actual LLM work that benefits from Sabrina’s reasoning. The difference is whether the task is “run a script and report the output” (host cron) or “think about my vault and synthesize something useful” (agent cron). Only one of those should involve an LLM.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Tools Become Literal Text
&lt;/h2&gt;

&lt;p&gt;OpenClaw gets updates. After updates, things break in interesting ways.&lt;/p&gt;

&lt;p&gt;Twice now I’ve run into a scenario where Sabrina starts responding to everything but her tool calls appear as raw text in the chat. Instead of actually reading a file, she’d output &lt;code&gt;read:/home/node/.openclaw/workspace/HEARTBEAT.md&lt;/code&gt; as a literal string.&lt;/p&gt;

&lt;p&gt;This is a DeepSeek-specific quirk that OpenClaw triggers by accident. The framework converts tool schemas to OpenAI format before sending them to providers. DeepSeek expects its own native format. The conversion breaks its tool call parsing silently. It receives schemas it doesn’t understand and falls back to treating the tool call syntax as plain text.&lt;/p&gt;

&lt;p&gt;The fix is a compat flag in the model config in &lt;code&gt;openclaw.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-chat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeepSeek V3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"contextWindow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;163840&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"maxTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"compat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"anthropicToolSchemaMode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"native"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;anthropicToolSchemaMode: "native"&lt;/code&gt; tells OpenClaw to skip the schema conversion and send the native format. Tools work again. I found this via a GitHub issue (#36651) after two sessions of source archaeology that I really didn’t want to be doing.&lt;/p&gt;

&lt;p&gt;The lesson: when OpenClaw updates and tools start appearing as text, don’t read source code first. Check GitHub issues and Reddit. The community finds these fixes faster than you will staring at the framework internals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ripping Out Slack
&lt;/h2&gt;

&lt;p&gt;OpenClaw supports Slack via socket mode. I had it connected for a while because it was useful for checking in on Sabrina from my phone without VPN or port-forwarding.&lt;/p&gt;

&lt;p&gt;Then an update changed the Slack config schema. The gateway crashed on startup with “Config invalid” and wouldn’t come back up until I removed the entire &lt;code&gt;channels.slack&lt;/code&gt; block from &lt;code&gt;openclaw.json&lt;/code&gt;. This happened twice. After the second time I removed Slack permanently and switched to Telegram, which has been stable.&lt;/p&gt;

&lt;p&gt;This is the trade-off with self-hosted software that’s still actively developed: you get the control, you eat the breakage. Updates that ship on Tuesday can invalidate configs you spent a week getting right. Having Claude Code manage the &lt;code&gt;~/.openclaw&lt;/code&gt; config directory directly, rather than asking Sabrina to fix herself through chat, means at least the fixes land correctly the first time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Actually Working
&lt;/h2&gt;

&lt;p&gt;Six weeks in, here’s the honest status:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Daily AI intelligence report:&lt;/strong&gt; Running reliably via host cron. Real data. Real URLs. Emails delivered by 7:30am.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weekly Obsidian report:&lt;/strong&gt; Agent-generated, delivers Fridays. Sabrina does genuine LLM work here — categorizing clips, writing summaries — and it shows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling:&lt;/strong&gt; Stable with the compat flag. Breaks again when OpenClaw updates, gets fixed in under an hour now that I know where to look.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The exec-approvals file:&lt;/strong&gt; Still fragile. I keep a copy of the correct config in my notes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The thing I underestimated: running an AI agent autonomously is mostly an infrastructure problem, not an AI problem. The interesting parts are the prompts and the LLM reasoning. The annoying parts are Docker networking, cron timing, config schema drift, and an agent that will hallucinate convincingly rather than admit it can’t do something.&lt;/p&gt;

&lt;p&gt;Sabrina’s useful. She’s also a liar when she’s backed into a corner. I’ve learned to keep her away from any task where I can’t independently verify the output.&lt;/p&gt;

&lt;p&gt;That’s not an OpenClaw problem or a DeepSeek problem. That’s just what LLMs do. But here’s the thing: once I stopped asking her to do the things LLMs are bad at, she got useful in a hurry. Most of what follows happened since last Thursday night.&lt;/p&gt;

&lt;h2&gt;
  
  
  But Here’s What She’s Actually Good At
&lt;/h2&gt;

&lt;p&gt;OpenClaw’s skill system is pluggable. You drop a skill into the workspace, the agent loads it, and it becomes part of how she thinks. Sabrina didn’t ship with most of her current capabilities. She built them through the same autonomous workflow she runs every day.&lt;/p&gt;

&lt;p&gt;A few that earn their slot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;sm-blog-outline&lt;/code&gt;&lt;/strong&gt; : Started life as a generic &lt;code&gt;blog-outline&lt;/code&gt; skill. Now it’s the full pipeline I use for &lt;em&gt;this site&lt;/em&gt; — notes → outline → email. Trained on my voice, my content pillars, my snark level. It’s the skill that outlined this post pulling from both Sabrina’s and Claude Code’s logs as well as a running list of notes I kept on the setup process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ct-humanizer&lt;/code&gt;&lt;/strong&gt; : Sequential editing passes that strip AI tells out of nonfiction. Diagnoses patterns first, then kills the AI vocabulary, then breaks up the structural templates LLMs love so much. Not a magic button, more like a brutal copy editor. It cleans up the outline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;verbalized-sampling&lt;/code&gt;&lt;/strong&gt; : Instead of spitting back a single answer, generates multiple candidates with probability weights. I use it for brainstorming and “show me five angles” tasks. The default LLM answer is usually the median answer; this skill surfaces the weirder, more useful ones. Got the idea &lt;a href="https://www.verbalized-sampling.com/" rel="noopener noreferrer"&gt;here&lt;/a&gt;, gave Opus all the documentation, and used the Claude &lt;code&gt;skill-creator&lt;/code&gt; skill to create it. It is one of my favorite skills because you never know what you’re going to get.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;vault-tag-search&lt;/code&gt;&lt;/strong&gt; + &lt;strong&gt;&lt;code&gt;vault-idea-scorer&lt;/code&gt;&lt;/strong&gt; : Companions to the blog pipeline. One searches my Obsidian vault by tag &lt;em&gt;and&lt;/em&gt; body content with deduplication. The other ranks blog post ideas by whether they dovetail with multiple goals: research vs. content vs. portfolio vs. SEO.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://clawhub.ai/ivangdavila/self-improving" rel="noopener noreferrer"&gt;A self-improving skill&lt;/a&gt;&lt;/strong&gt;: Logs corrections and preferences so Sabrina compounds learning between sessions instead of getting the same feedback every week.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The point isn’t any single skill. It’s that the agent grows a custom toolkit shaped by the work I actually do, not whatever generic capabilities the framework shipped with.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Report Engine Isn’t a One-Trick Pony
&lt;/h3&gt;

&lt;p&gt;That &lt;code&gt;ai_report.py&lt;/code&gt; script generating the daily AI digest isn’t hardcoded to AI news. It’s a topic-agnostic engine that takes a profile flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 ai_report.py &lt;span class="nt"&gt;--profile&lt;/span&gt; ai-intelligence
python3 ai_report.py &lt;span class="nt"&gt;--profile&lt;/span&gt; golang
python3 ai_report.py &lt;span class="nt"&gt;--profile&lt;/span&gt; typescript

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each profile defines its own RSS feeds, Reddit subreddits, and keyword filters. Tunable depth too: brief briefing vs. deep dive, set per profile. Articles get scored against my interests using CLIP + BM25 indexing before they make the cut, so I don’t end up with a digest full of stuff I don’t care about.&lt;/p&gt;

&lt;p&gt;Same engine, different sources, same usefulness. Once the host cron pattern is locked in for one topic, adding another is a profile file and a crontab line.&lt;/p&gt;

&lt;h3&gt;
  
  
  Email Delivery, Old School On Purpose
&lt;/h3&gt;

&lt;p&gt;Everything Sabrina produces comes to me as email. Gmail SMTP, app password auth…for now. Yes, that’s old fashioned. That’s the feature.&lt;/p&gt;

&lt;p&gt;A dashboard would be one more thing to check. Notifications would be one more app fighting for attention. Email is the universal inbox I already process. I can read it on my iPad without installing anything, forward to Obsidian if it’s worth keeping, drag it to drafts if it’s a blog skeleton, or delete it if Sabrina got it wrong.&lt;/p&gt;

&lt;p&gt;The pattern is generic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;send-email.sh "Subject" body-or-file [attachment]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it. Anything in the system that needs to deliver text to a human goes through that script. Reports blog outlines, and research summaries use it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Model, Not Locked to DeepSeek
&lt;/h3&gt;

&lt;p&gt;DeepSeek runs the daily cron work because it’s cheap. But Sabrina isn’t married to it. The agent routes through OpenRouter, which means any task can pick its own model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;qwen/qwen3.6-plus&lt;/code&gt;&lt;/strong&gt; — 1M context window, great for long-form research and generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;minimax/minimax-m2.5&lt;/code&gt;&lt;/strong&gt; — strong reasoning, what I reach for on analytical work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;google/gemini-3-flash-preview&lt;/code&gt;&lt;/strong&gt; — also 1M context, fast&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;moonshotai/kimi-k2.6&lt;/code&gt;&lt;/strong&gt; — solid alternative when the others are misbehaving&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The job picks the model. Daily AI report? DeepSeek, because it’s cheap and the task isn’t hard. Blog outline that needs to chew through a pile of research notes? Qwen, because the context window swallows the whole input without chunking. Analytical synthesis? Minimax. And again, for now. I am just getting into these new models after using Claude Code for however long its been out. But the success I’ve have with them has me setting up Opencode to use them.&lt;/p&gt;

&lt;p&gt;The subagent system lets me parallelize too. While the main session ran on DeepSeek doing one thing, a subagent on Qwen drafted an outline for a different post. Two models, two tasks, one wall clock.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Track Record, Three Days In
&lt;/h3&gt;

&lt;p&gt;Concrete deliverables since Thursday night:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Blog outlines:&lt;/strong&gt; Two posts — a Kiro AI article and one I’m calling “The AI Psychologist” — both went notes → web research → verbalized sampling for angle selection → outline → email. Full pipeline, no me-in-the-loop until the outline showed up in my inbox.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research tasks:&lt;/strong&gt; Author bios with structured JSON + bibliography, topic deep-dives on AI tools, vibe coding, prompt engineering psychology. Stuff I’d normally burn an afternoon on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brainstorming:&lt;/strong&gt; Content ideas, project names, productivity workflows, all using verbalized sampling so I get diverse options with probability weights instead of one safe median answer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory compounding:&lt;/strong&gt; Daily logs roll up to weekly memory promotion. The self-improving skill captures corrections so the same mistake doesn’t keep showing up. Each week she’s a little less stupid about my preferences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weekly Obsidian reports:&lt;/strong&gt; Genuinely useful vault digests. What changed. What’s worth re-reading. What’s collecting dust and should be archived or thrown out.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this involves Sabrina pretending to run scripts she can’t run. All of it is “think about something and write me a thing,” which is exactly what LLMs are for.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Built
&lt;/h2&gt;

&lt;p&gt;Six weeks ago I wanted an autonomous AI agent. What I have now is better and stupider at the same time.&lt;/p&gt;

&lt;p&gt;The discovery, after all the silent hallucinations and config schema drift and tool-calls-as-text bullshit: AI agents are great at the &lt;em&gt;thinking&lt;/em&gt; parts: research, writing, brainstorming, synthesis. They’re terrible at the &lt;em&gt;doing&lt;/em&gt; parts: running scripts reliably, admitting they can’t do something, not making shit up when cornered.&lt;/p&gt;

&lt;p&gt;So I built around the doing and leaned into the thinking. Sabrina does real work now. She just doesn’t run the cron jobs herself anymore: the host crontab does. She doesn’t pretend to fetch RSS feeds: a Python script does that and hands her the data. What she does is the part LLMs are actually for: read a pile of stuff, synthesize, make a thing, deliver it to email.&lt;/p&gt;

&lt;p&gt;The host cron + agent hybrid is the pattern that actually ships. The agent is the writer, not the operator. The operator is &lt;code&gt;cron&lt;/code&gt; and a Python interpreter, both of which have been doing their jobs reliably since long before transformers were a thing.&lt;/p&gt;

&lt;p&gt;Six weeks to figure out what should have been obvious from the start: stop using language models for things that aren’t language. At least that’s what I’m going with until I have time to go through another break then fix continously cycle.&lt;/p&gt;

</description>
      <category>aiagents</category>
    </item>
    <item>
      <title>April 2026 Model Roundup: Opus 4.7 Official, DeepSeek V4 Open-Sources 1M Context, and GPT-5.5 Upstaged the GPT-6 Hype</title>
      <dc:creator>Stephan Miller</dc:creator>
      <pubDate>Fri, 24 Apr 2026 12:00:00 +0000</pubDate>
      <link>https://forem.com/eristoddle/april-2026-model-roundup-opus-47-official-deepseek-v4-open-sources-1m-context-and-gpt-55-47m1</link>
      <guid>https://forem.com/eristoddle/april-2026-model-roundup-opus-47-official-deepseek-v4-open-sources-1m-context-and-gpt-55-47m1</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkub0qcd717p9b0keoh8w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkub0qcd717p9b0keoh8w.png" alt=" " width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two weeks ago this month, developers discovered their Gemini API bills had exploded. Google’s billing system was charging for approximately 114 internal search queries per API call with grounding enabled. That was the story I started writing. By the time April 24 arrived, three new models had officially launched, the “Still Waiting for GPT-6” watch ended not with GPT-6 but with GPT-5.5, and DeepSeek V4 dropped today with a 1M context window under Apache 2.0, on the same day GPT-5.5 went live, apparently just to split the news cycle.&lt;/p&gt;

&lt;p&gt;This roundup covers April 2026 so far.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;When Google Billed 114x&lt;/li&gt;
&lt;li&gt;What Actually Moved This Week&lt;/li&gt;
&lt;li&gt;Claude Opus 4.7 Is Official — and the Cost Story Is Better Than Expected&lt;/li&gt;
&lt;li&gt;Hype Check: Mimo V2 Pro, One Month In&lt;/li&gt;
&lt;li&gt;Kimi K2.6: The Open-Source Agentic Coding Model Nobody Covered&lt;/li&gt;
&lt;li&gt;Meta Broke Open Source Hearts&lt;/li&gt;
&lt;li&gt;The Models That Cost Almost Nothing (No, Really)&lt;/li&gt;
&lt;li&gt;The Hidden Tax: How Sonnet 4.6 Can Still Cost More Than Opus&lt;/li&gt;
&lt;li&gt;GPT-5.5 Shipped Yesterday, Not GPT-6, and DeepSeek V4 Dropped Today&lt;/li&gt;
&lt;li&gt;The Actual Takeaways&lt;/li&gt;
&lt;li&gt;Read for Yourself&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When Google Billed 114x
&lt;/h2&gt;

&lt;p&gt;Gemini 3 Flash Preview is Google’s high-volume, reasonably priced model: $0.50/M input, $3/M output, 1M context window. It’s been running at #4 on OpenRouter by weekly token volume. A lot of people have pipelines running on it. The “search grounding” feature, which lets the model query Google Search to ground its responses in real-time information, sounds great on paper.&lt;/p&gt;

&lt;p&gt;Turns out the billing for that feature had a misconfiguration. For every API call, users were being billed for roughly &lt;strong&gt;114 separate search queries&lt;/strong&gt; rather than the actual number of queries they used. The “Generate content search query Gemini 3” SKU in users’ dashboards was showing 10-15x the expected line items. Actual grounding call frequency had decreased, but bills exploded anyway.&lt;/p&gt;

&lt;p&gt;The scale of damage before Google caught it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple developers reporting 4x–10x cost increases on flat or declining usage&lt;/li&gt;
&lt;li&gt;€1,000+ additional daily costs for at least one European developer&lt;/li&gt;
&lt;li&gt;₩340,000 in two days for a Korean developer&lt;/li&gt;
&lt;li&gt;Google identified the root cause on April 14, committed to fixing the misconfiguration and correcting previous bills&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Update as of April 24:&lt;/strong&gt; Google engineer Ali Cevik confirmed on the developer forum that the billing misconfiguration is fixed going forward. Refunds are being processed, but Google has provided no specific timeline. Forum responses from their team said “by the end of the month” without committing to anything more specific. Affected users are reporting that support is framing corrections as “one-time exceptions” rather than acknowledging the systemic bug. Re-enabling grounding is probably safe now for new calls, but check your billing dashboard before turning it back on, and watch the first few days’ charges carefully.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://discuss.ai.google.dev/t/sudden-cost-spike-with-gemini-3-flash-preview-despite-decreased-usage-april-2026/139138" rel="noopener noreferrer"&gt;thread&lt;/a&gt; on the Google AI Developers Forum is worth reading if you’re running anything on Gemini 3 Flash with grounding enabled. The concrete lesson here: &lt;strong&gt;search grounding is billed separately from token usage&lt;/strong&gt; , and before you enable any “enhanced” feature on a high-volume model, understand exactly what gets metered and how. Don’t assume the main pricing page tells the whole story.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F83ndcooek9nw3uqvnnpb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F83ndcooek9nw3uqvnnpb.png" alt="April 2026 LLM Model Ranking" width="800" height="815"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Moved This Week
&lt;/h2&gt;

&lt;p&gt;Here’s the OpenRouter picture as of the week ending April 24:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Weekly Tokens&lt;/th&gt;
&lt;th&gt;WoW Change&lt;/th&gt;
&lt;th&gt;Arena Overall&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;1.38T&lt;/td&gt;
&lt;td&gt;+3%&lt;/td&gt;
&lt;td&gt;#3 (1496)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;DeepSeek V3.2&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;1.32T&lt;/td&gt;
&lt;td&gt;+3%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Gemini 3 Flash Preview&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;1.11T&lt;/td&gt;
&lt;td&gt;stable&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Claude Opus 4.7&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;951B&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+4,221%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;#1 (1503, thinking)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Mimo V2 Pro&lt;/td&gt;
&lt;td&gt;Xiaomi&lt;/td&gt;
&lt;td&gt;902B&lt;/td&gt;
&lt;td&gt;+9%&lt;/td&gt;
&lt;td&gt;not yet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;MiniMax M2.5&lt;/td&gt;
&lt;td&gt;Minimax&lt;/td&gt;
&lt;td&gt;856B&lt;/td&gt;
&lt;td&gt;+22%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;MiniMax M2.7&lt;/td&gt;
&lt;td&gt;Minimax&lt;/td&gt;
&lt;td&gt;813B&lt;/td&gt;
&lt;td&gt;+24%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Kimi K2.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moonshot AI&lt;/td&gt;
&lt;td&gt;792B&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;New&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;not yet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Claude Opus 4.6&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;756B&lt;/td&gt;
&lt;td&gt;+46%&lt;/td&gt;
&lt;td&gt;#2 (1503, thinking)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Grok 4.1 Fast&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;X.AI&lt;/td&gt;
&lt;td&gt;700B&lt;/td&gt;
&lt;td&gt;+33%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three new entries this week: Claude Opus 4.7 at #4 with a 4,221% spike, Kimi K2.6 debuting at #8 on its first week, and Grok 4.1 Fast at #10. Claude Opus 4.6, which was briefly the second-most-used model, dropped to #9 as people migrated to 4.7.&lt;/p&gt;

&lt;p&gt;The stable story continues in the background: Claude Sonnet 4.6 and DeepSeek V3.2 are running neck and neck at the top, both at a slow +3% WoW. That’s real production traffic, not evaluation runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Opus 4.7 Is Official — and the Cost Story Is Better Than Expected
&lt;/h2&gt;

&lt;p&gt;Anthropic launched Claude Opus 4.7 on April 16. It’s been sitting in Arena’s blind comparison system for a few weeks, and it’s now publicly available on the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.&lt;/p&gt;

&lt;p&gt;Pricing: &lt;strong&gt;$5/M input, $25/M output&lt;/strong&gt; — unchanged from Opus 4.6. That’s the headline.&lt;/p&gt;

&lt;p&gt;The real story is what the model does to your costs in practice. Artificial Analysis ran Opus 4.7 through their GDPVal-AA benchmark suite (44 occupations, 9 industries) and found it uses roughly &lt;strong&gt;35% fewer output tokens than Opus 4.6&lt;/strong&gt; to complete the same tasks. The practical effect: real-world costs on Opus 4.7 run approximately 11% lower than Opus 4.6 at the same stated price per token.&lt;/p&gt;

&lt;p&gt;There’s a caveat on the input side. The 4.7 tokenizer is less efficient, generating up to 35% more tokens from the same input text depending on content type. For workloads with heavy, repeated system prompts or long document context, this can offset some of the output savings. Prompt caching (available at roughly 10% of the input rate) largely neutralizes this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance numbers that matter:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Artificial Analysis Intelligence Index: &lt;strong&gt;57&lt;/strong&gt; (up 4 points from Opus 4.6, tied with GPT-5.4 and Gemini 3.1 Pro)&lt;/li&gt;
&lt;li&gt;GDPVal-AA: &lt;strong&gt;1,753 Elo&lt;/strong&gt; — 79 points ahead of the next model on real-world knowledge work&lt;/li&gt;
&lt;li&gt;Hallucination rate: &lt;strong&gt;36%&lt;/strong&gt; (down from 61% on Opus 4.6, achieved through more frequent abstention)&lt;/li&gt;
&lt;li&gt;Arena: &lt;strong&gt;#1 tied&lt;/strong&gt; at 1503 Elo (with thinking mode), #4 at 1494 without thinking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 4,221% WoW spike on OpenRouter is a curiosity spike plus a migration wave from people moving from 4.6. By next week you’ll see whether it settles into stable sustained usage or was just upgrade traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New cybersecurity guardrails:&lt;/strong&gt; Anthropic added automatic detection and blocking for prohibited cybersecurity uses. Security professionals doing legitimate work (pen testing, vuln research, red-teaming) need to join their new Cyber Verification Program to preserve access to those capabilities on 4.7.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hype Check: Mimo V2 Pro, One Month In
&lt;/h2&gt;

&lt;p&gt;Mimo V2 Pro shot up 140% WoW in the April 14 data. Now, with another week of data, it’s at +9%. The spike is over and it’s settling into a real usage tier.&lt;/p&gt;

&lt;p&gt;Xiaomi’s flagship foundation model: over 1 trillion total parameters, 42 billion active (MoE architecture), $1/M input, $3/M output, 1 million token context window. Benchmarks put it at 49 on the Artificial Analysis Intelligence Index.&lt;/p&gt;

&lt;p&gt;The +140% spike was the evaluation-and-curiosity phase. The +9% continuing growth suggests people who ran it liked it enough to keep using it. Still no Arena votes worth analyzing. At ~5 weeks old, the model hasn’t been around long enough for production validation at scale.&lt;/p&gt;

&lt;p&gt;Check back in 2–3 more weeks. If it accumulates Arena votes and holds a respectable position there, the benchmarks were real. Stable OpenRouter usage without Arena presence is ambiguous: it could mean quality users who prefer specific capabilities, or it could mean low-friction API access driving test traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kimi K2.6: The Open-Source Agentic Coding Model Nobody Covered
&lt;/h2&gt;

&lt;p&gt;Moonshot AI released Kimi K2.6 on April 20 and it debuted at #8 on OpenRouter in its first week. You probably missed it because the same week had Claude Opus 4.7’s official launch, Grok 4.3 Beta, and the GPT-5.5 pre-announcement noise.&lt;/p&gt;

&lt;p&gt;What it is: a 1-trillion-parameter MoE model with 32B active parameters, &lt;strong&gt;262,144-token context window&lt;/strong&gt; , vision, and agentic capabilities. Weights published on Hugging Face under a &lt;strong&gt;Modified MIT License&lt;/strong&gt; : full open weights, commercially usable.&lt;/p&gt;

&lt;p&gt;What it’s built for: long-horizon coding agents, front-end generation from natural language, and massively parallel agent swarms. Moonshot’s documentation specifically highlights scaling to 300 sub-agents and 4,000 coordinated steps in a single session. If you’re building orchestration-heavy multi-agent systems, this is the open-weight model that was designed from the ground up for that use case.&lt;/p&gt;

&lt;p&gt;Benchmark comparisons are mixed but solid. On SWE-Bench Pro it outperforms DeepSeek V4-Pro (58.6 vs 55.4). On LiveCodeBench it trails V4-Pro (89.6 vs 93.5). On competitive coding (Codeforces), both trail GPT-5.5.&lt;/p&gt;

&lt;p&gt;No pricing table yet because it’s primarily a self-hosted model. Kimi API pricing for hosted inference isn’t broadly published yet. For the open-weights version: the cost is your inference infrastructure. A 32B-active MoE runs reasonably on mid-tier GPU setups.&lt;/p&gt;

&lt;p&gt;DeepSeek V4 (more on that below) is the stronger model by most closed benchmarks. But Kimi K2.6 has the context window advantage (262K vs 1M for V4-Pro — actually V4-Pro wins there), and the MIT-derived license is cleaner than Apache 2.0 for certain commercial use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Meta Broke Open Source Hearts
&lt;/h2&gt;

&lt;p&gt;Llama made Meta relevant in the AI developer world. Open weights, commercial use, the whole deal. Llama 4 dropped in 2025 with a 10 million token context window and impressive parameter counts. The developer community built on it. People ran it locally, fine-tuned it, deployed it. Meta was the company that understood that open source was ecosystem building.&lt;/p&gt;

&lt;p&gt;Then on April 8, Muse Spark dropped from Meta’s new “Superintelligence Labs.” Proprietary model. Not open weights. API in private preview. To try it on the web, you need a Facebook or Instagram login.&lt;/p&gt;

&lt;p&gt;Meta went from an Artificial Analysis Index score of 18 with Llama 4 Maverick to 52 with Muse Spark. That’s not a modest improvement. And in Arena’s head-to-head voting, Muse Spark is sitting at #6 overall with an Elo of 1492, beating GPT-5.4-high in actual user preference votes.&lt;/p&gt;

&lt;p&gt;So the model is legitimately good. &lt;strong&gt;As of April 24, the API remains private preview only: no public access, no announced pricing, no timeline for broader availability.&lt;/strong&gt; Priority access is going to healthcare, education, and enterprise research partners. If you’re building something that needs Muse Spark today, you’re waiting.&lt;/p&gt;

&lt;p&gt;“Meta learned from OpenAI: make the good stuff closed, give the community the crumbs.” I’ve been seeing that take everywhere this month, and I don’t think it’s entirely wrong.&lt;/p&gt;

&lt;p&gt;The broader question this raises: if every lab eventually closes off its best models, what’s the long-term roadmap for building on open weights? DeepSeek and Moonshot are still playing the open-source game. Kimi K2.6 is MIT-licensed. And DeepSeek V4 dropped today with Apache 2.0 weights on Hugging Face. The pattern is becoming hard to ignore, but there are still holdouts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Models That Cost Almost Nothing (No, Really)
&lt;/h2&gt;

&lt;p&gt;I need to talk about MiniMax M2.5 because I’ve been mentioning it in passing and it deserves its own paragraph.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;$0.118 per million input tokens.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That’s twelve cents per million tokens. For a model that’s sitting at #6 on OpenRouter by weekly volume and growing at 22% WoW. For a model that scores 80.2% on SWE-Bench Verified: which is roughly what Claude’s flagship hits. With a 196,608 token context window. And it’s good enough at agentic tasks that it’s been called out repeatedly in the Latent.Space local model community as the go-to for tool-heavy applications.&lt;/p&gt;

&lt;p&gt;The pricing table this week:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input $/1M&lt;/th&gt;
&lt;th&gt;Output $/1M&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$25.00&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Arena #1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;#1 volume on OR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$30.00&lt;/td&gt;
&lt;td&gt;2M&lt;/td&gt;
&lt;td&gt;Tops AA Index at 60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5 Pro&lt;/td&gt;
&lt;td&gt;$30.00&lt;/td&gt;
&lt;td&gt;$180.00&lt;/td&gt;
&lt;td&gt;2M&lt;/td&gt;
&lt;td&gt;Research tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4-Pro&lt;/td&gt;
&lt;td&gt;$1.74&lt;/td&gt;
&lt;td&gt;$3.48&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Apache 2.0, released today&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4-Flash&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;$0.28&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Apache 2.0, released today&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V3.2&lt;/td&gt;
&lt;td&gt;$0.259&lt;/td&gt;
&lt;td&gt;$0.42&lt;/td&gt;
&lt;td&gt;163K&lt;/td&gt;
&lt;td&gt;3-mo validated, #2 on OR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MiniMax M2.5&lt;/td&gt;
&lt;td&gt;$0.118&lt;/td&gt;
&lt;td&gt;$0.99&lt;/td&gt;
&lt;td&gt;196K&lt;/td&gt;
&lt;td&gt;80.2% SWE-Bench&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MiniMax M2.7&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;$1.20&lt;/td&gt;
&lt;td&gt;196K&lt;/td&gt;
&lt;td&gt;Upgraded M2.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mimo V2 Pro&lt;/td&gt;
&lt;td&gt;$1.00&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Settling into usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3 Flash&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;Grounding: proceed cautiously&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When MiniMax M2.5 matches or beats Sonnet on SWE-Bench at roughly 1/25th the per-token cost, we’re in strange territory. Either the benchmark is missing something important about real-world usability, or there’s value being left on the table by anyone running default Claude endpoints on agentic coding tasks without at least testing alternatives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My actual picks this week, by use case:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Budget, coding/agentic&lt;/strong&gt; : &lt;strong&gt;DeepSeek V4-Flash&lt;/strong&gt; at $0.14/M: just dropped today, open source. Test it immediately. MiniMax M2.5 at $0.118/M is still the safety pick if you want community-validated quality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Budget, general&lt;/strong&gt; : DeepSeek V3.2. $0.42/M output, three months of community validation, strong on math and code. Nothing changed here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Balanced&lt;/strong&gt; : DeepSeek V4-Pro at $1.74/M input, $3.48/M output with 1M context. Undercuts everything at this quality tier by a factor of 5-8x.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Premium, coding&lt;/strong&gt; : Claude Sonnet 4.6 or Claude Opus 4.7 depending on your task complexity and whether the token economics work out (see the next section).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you have to have the absolute best&lt;/strong&gt; : Claude Opus 4.7 with thinking (Arena #1, 1503 Elo) or GPT-5.5 (AA Index #1 at 60). Accept the pricing gap vs open-source alternatives.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Hidden Tax: How Sonnet 4.6 Can Still Cost More Than Opus
&lt;/h2&gt;

&lt;p&gt;Claude Sonnet 4.6 is marketed as the economical alternative to Opus. It’s $3/M input versus Opus 4.7’s $5/M: a modest 1.67x difference on input. But that’s not where your money goes on agentic workloads.&lt;/p&gt;

&lt;p&gt;On the Artificial Analysis GDPVal-AA benchmark, Sonnet 4.6 generates &lt;strong&gt;4.5x more output tokens&lt;/strong&gt; than Opus 4.6 to complete the same tasks. The model isn’t worse. It’s producing more intermediate reasoning, more scaffolding, more steps. But output tokens are what you pay for.&lt;/p&gt;

&lt;p&gt;The math with correct current pricing: Sonnet 4.6 at 4.5x tokens × $15/M output = &lt;strong&gt;$67.50/M effective output cost&lt;/strong&gt; versus Opus 4.7 at $25/M output. Sonnet costs 2.7x more per equivalent task in heavy agentic use.&lt;/p&gt;

&lt;p&gt;The practical takeaway: if you’re running document summarization, one-shot Q&amp;amp;A, light code generation, Sonnet 4.6 is cheaper and you should use it. If you’re running agentic pipelines, autonomous coding agents, extended tool-use workflows: &lt;strong&gt;benchmark on your actual workload before you assume Sonnet saves money&lt;/strong&gt;. The pricing page isn’t lying; the intuitive comparison probably is.&lt;/p&gt;

&lt;p&gt;And now there’s a third option in the mix: Opus 4.7, which uses ~35% fewer output tokens than Opus 4.6 at the same $25/M rate. For heavy agentic use, Opus 4.7 may be the cheapest of the three Anthropic options. Run your own numbers.&lt;/p&gt;

&lt;h2&gt;
  
  
  GPT-5.5 Shipped Yesterday, Not GPT-6, and DeepSeek V4 Dropped Today
&lt;/h2&gt;

&lt;p&gt;OpenAI finished pre-training the model codenamed “Spud” on March 24. An April 14 release date came and went with nothing. Then on April 23, OpenAI shipped &lt;strong&gt;GPT-5.5&lt;/strong&gt; — their most capable model to date and, per their description, the first fully retrained base since GPT-4.5.&lt;/p&gt;

&lt;p&gt;It’s not GPT-6. But it’s real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-5.5 numbers:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Artificial Analysis Intelligence Index: 60&lt;/strong&gt; — three points ahead of Claude Opus 4.7 and Gemini 3.1 Pro Preview, both at 57&lt;/li&gt;
&lt;li&gt;Terminal-Bench 2.0: 82.7% (vs 75.1% for GPT-5.4)&lt;/li&gt;
&lt;li&gt;Expert-SWE: 73.1% (vs 68.5% for GPT-5.4)&lt;/li&gt;
&lt;li&gt;Pricing: &lt;strong&gt;$5/M input, $30/M output&lt;/strong&gt; — double the cost of GPT-5.4 on output&lt;/li&gt;
&lt;li&gt;GPT-5.5 Pro tier: $30/M input, $180/M output (research/enterprise)&lt;/li&gt;
&lt;li&gt;Context window: 2M tokens (1M longer than most competitors)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 40% reduction in output token usage that OpenAI claims keeps the effective cost increase to roughly 20% despite the doubled price per token. That math depends entirely on your workload matching the benchmark profile.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now the same-day plot twist:&lt;/strong&gt; On April 24 — today — &lt;strong&gt;DeepSeek V4&lt;/strong&gt; dropped with open weights under Apache 2.0.&lt;/p&gt;

&lt;p&gt;DeepSeek V4-Pro: 1.6T total parameters, 49B active (MoE), 1M context window, $1.74/M input, $3.48/M output. V4-Pro output is &lt;strong&gt;8.6x cheaper than GPT-5.5&lt;/strong&gt; and &lt;strong&gt;21x cheaper than Claude Opus 4.7&lt;/strong&gt; at stated per-token rates.&lt;/p&gt;

&lt;p&gt;DeepSeek V4-Flash: 284B total parameters, 13B active, 1M context, $0.14/M input, $0.28/M output.&lt;/p&gt;

&lt;p&gt;Both variants under Apache 2.0, weights on Hugging Face and ModelScope today.&lt;/p&gt;

&lt;p&gt;Performance on competitive coding (Codeforces): V4-Pro scores 3,206 vs GPT-5.5’s 3,168 — V4-Pro wins. On SWE-Bench Pro, Kimi K2.6 beats V4-Pro (58.6 vs 55.4). On long-context retrieval (MRCR 1M), Claude Opus 4.6 beats V4-Pro (92.9 vs 83.5). So V4-Pro isn’t universally better — but at $3.48/M output vs $25-30/M for closed alternatives, it doesn’t need to be universally better to be the right answer for most workloads.&lt;/p&gt;

&lt;p&gt;GPT-6 “Spud”: still hasn’t arrived. Polymarket has it at 72% by April 30 and 95%+ by June 30. At this point I’ll believe it when I see it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Other things still in the pipeline:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Mythos Preview&lt;/strong&gt; : still available only to approximately 50 partner organizations since April 7. Cybersecurity focus. $25/M input, $125/M output. Nothing changed here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grok 4.3 Beta&lt;/strong&gt; : dropped April 17 with native video understanding, PDF/PowerPoint generation, and enhanced long-context processing. Not yet on OpenRouter broadly. Still in xAI testing phase.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Actual Takeaways
&lt;/h2&gt;

&lt;p&gt;April 2026 shipped more major model releases than any previous month in AI history, and then DeepSeek V4 and GPT-5.5 both dropped on the same day at the end of it. The landscape looks different today than it did two weeks ago.&lt;/p&gt;

&lt;p&gt;What actually matters as of April 24:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gemini 3 Flash grounding billing is fixed going forward&lt;/strong&gt; but check your billing dashboard before re-enabling, and watch the first few days’ charges carefully. Refunds are in process; don’t expect speed on that.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DeepSeek V4 just dropped open-source with 1M context and Apache 2.0.&lt;/strong&gt; V4-Flash at $0.14/$0.28 and V4-Pro at $1.74/$3.48. Test it today. It’s too new for community validation but the pedigree is real and the pricing is absurd for the quality tier.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MiniMax M2.5 at $0.118/M and 80.2% SWE-bench is still the community-validated budget pick for agentic coding.&lt;/strong&gt; Three weeks of steady usage volume with no hype cycle. DeepSeek V4-Flash is the new challenger — if validation holds over the next few weeks, it may displace M2.5.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Claude Opus 4.7 is the easiest top-tier upgrade you’ll make this month.&lt;/strong&gt; Same price as 4.6, 35% fewer output tokens, Arena #1. If you’re running Opus 4.6 today, just switch.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Benchmark Sonnet 4.6 vs Opus 4.7 on your actual agentic workloads.&lt;/strong&gt; Opus 4.7’s improved token efficiency means the economics may favor it over Sonnet for complex agent tasks. Run the math on your usage before assuming Sonnet is cheaper.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mimo V2 Pro and Kimi K2.6 need another 2-3 weeks.&lt;/strong&gt; Both show real usage momentum. Neither has Arena data yet. Hold the investment thesis pending community validation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GPT-5.5 topped the Artificial Analysis Intelligence Index at 60.&lt;/strong&gt; That matters, but at $30/M output you’re paying a substantial premium over DeepSeek V4-Pro ($3.48/M) for about 3 points on a benchmark. Evaluate whether that delta maps to your actual workload before committing.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model evaluation cycle for “is this the right choice?” is now measured in weeks, not quarters&lt;/p&gt;

&lt;h2&gt;
  
  
  Read for Yourself
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://discuss.ai.google.dev/t/sudden-cost-spike-with-gemini-3-flash-preview-despite-decreased-usage-april-2026/139138" rel="noopener noreferrer"&gt;Gemini 3 Flash billing bug thread&lt;/a&gt;&lt;/strong&gt; — r/[Google AI Dev Forum] — Developer discussion of the billing disaster, with cost breakdowns and screenshots&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://techcrunch.com/2026/04/08/meta-debuts-the-muse-spark-model-in-a-ground-up-overhaul-of-its-ai/" rel="noopener noreferrer"&gt;Meta introduces Muse Spark&lt;/a&gt;&lt;/strong&gt; — TechCrunch — The story of Meta’s open-source pivot; comment threads are heated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.latent.space/p/ainews-top-local-models-list-april" rel="noopener noreferrer"&gt;Top Local Models List April 2026&lt;/a&gt;&lt;/strong&gt; — Latent.Space — Community-validated rankings for open-weight models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://decrypt.co/362633/xiaomi-mimo-v2-pro-review-so-good-mistaken-deepseek-v4" rel="noopener noreferrer"&gt;Mimo V2 Pro: mistaken for DeepSeek V4&lt;/a&gt;&lt;/strong&gt; — Decrypt — The review that captures the week’s Xiaomi surprise&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://awesomeagents.ai/reviews/review-claude-sonnet-4-6/" rel="noopener noreferrer"&gt;Claude Sonnet 4.6: the workhorse that ate the flagship&lt;/a&gt;&lt;/strong&gt; — AwesomeAgents — Honest multi-week review with the token cost caveat&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>largelanguagemodels</category>
    </item>
    <item>
      <title>Microsoft APM - Managing AI Context Like a Dependency Problem</title>
      <dc:creator>Stephan Miller</dc:creator>
      <pubDate>Mon, 13 Apr 2026 12:00:00 +0000</pubDate>
      <link>https://forem.com/eristoddle/microsoft-apm-managing-ai-context-like-a-dependency-problem-5361</link>
      <guid>https://forem.com/eristoddle/microsoft-apm-managing-ai-context-like-a-dependency-problem-5361</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh1av2e4p8h7ab7q3jbtx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh1av2e4p8h7ab7q3jbtx.png" alt=" " width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It started with a small problem that wouldn’t stop nagging me.&lt;/p&gt;

&lt;p&gt;I had AI coding tools scattered across machines, each one configured slightly differently, each one producing slightly different results. My Claude Code setup on my laptop didn’t match my desktop. I had created skills for these coding agents, but the skills I could use depended on which machine I was using. It had a “ it works on my machine” issue, but they were all my machines.&lt;/p&gt;

&lt;p&gt;I started using &lt;a href="https://github.com/runkids/skillshare" rel="noopener noreferrer"&gt;Skillshare&lt;/a&gt; a little while ago and it helps somewhat, but it focuses on syncing the skills between the coding agents configs in your user folder. This type of functionality is useful for some skills, but not for all of them, Because sometimes you only need skills at the repo level. And putting all your coding agent skills at the user folder level not only pollutes your context, but makes it hard to find a specific skill when you want one.&lt;/p&gt;

&lt;p&gt;So when I was asked to look for an enterprise tool to manage skills with a focus on Github Copilot, I found &lt;a href="https://github.com/microsoft/apm/blob/main/README.md" rel="noopener noreferrer"&gt;Microsoft APM&lt;/a&gt;, Agent Package Manager.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The Blueprint: APM’s Declarative Infrastructure&lt;/li&gt;
&lt;li&gt;Organizing the Monorepo&lt;/li&gt;
&lt;li&gt;Solving Context Pollution with Intelligent Compilation&lt;/li&gt;
&lt;li&gt;Automating the Standards&lt;/li&gt;
&lt;li&gt;Local Iteration and the Playground Strategy&lt;/li&gt;
&lt;li&gt;From Supervisor to Architect&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Blueprint: APM’s Declarative Infrastructure
&lt;/h2&gt;

&lt;p&gt;Many of us are still prompting it like it’s 2023. Copy-paste a prompt. Maybe drop a &lt;code&gt;CLAUDE.md&lt;/code&gt; or &lt;code&gt;AGENTS.md&lt;/code&gt; in the repo. Hope for the best. When the AI does something dumb, yell at it in the chat window and hope it remembers next time. It won’t.&lt;/p&gt;

&lt;p&gt;APM replaces all of that with a declarative, version-locked workflow that treats AI context the same way we treat dependencies. You declare what you need in &lt;code&gt;apm.yml&lt;/code&gt;, lock it with &lt;code&gt;apm.lock.yaml&lt;/code&gt;, and install it with &lt;code&gt;apm install&lt;/code&gt;. If that sounds like npm or pip, good. That’s the point. We solved dependency management for code twenty years ago. It’s insane that we’re still managing AI context by hand.&lt;/p&gt;

&lt;p&gt;The system is built around seven primitives:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Instructions&lt;/strong&gt; — The guardrails. Think &lt;code&gt;CLAUDE.md&lt;/code&gt; or &lt;code&gt;AGENTS.md&lt;/code&gt; files that tell the AI how to behave.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills&lt;/strong&gt; — Reusable capabilities the AI can invoke.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts&lt;/strong&gt; — Executable task templates with defined inputs. Called commands in Claude Code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents&lt;/strong&gt; — Specialized sub-agents with their own instructions and tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks&lt;/strong&gt; — Shell commands that fire on specific events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugins&lt;/strong&gt; — Extensions that add functionality to the agent runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Servers&lt;/strong&gt; — Model Context Protocol servers that give agents access to external tools and data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The part that sold me: APM doesn’t run a daemon or require a runtime. It populates your existing &lt;code&gt;.github/&lt;/code&gt;, &lt;code&gt;.claude/&lt;/code&gt;, and &lt;code&gt;.cursor/&lt;/code&gt; folders with native configuration files. The agents just pick them up. If you delete APM tomorrow, those files still work. Zero lock-in. That’s how you know someone thought about this for more than a weekend.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://microsoft.github.io/apm/key-concepts/" rel="noopener noreferrer"&gt;Key Concepts Guide&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Organizing the Monorepo
&lt;/h2&gt;

&lt;p&gt;So you’ve got seven types of primitives and you want to share them across multiple projects. Maybe across a whole team. Maybe across an entire engineering organization. You need structure, or you’ll drown in conflicting instructions and duplicated skills within a month. For now, this works:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Department&lt;/strong&gt; — The top layer. These are your organization-wide standards. Security policies. Code review requirements. Compliance guardrails. The stuff that applies everywhere and nobody gets to opt out of. Think of it like your company’s engineering handbook, except the AI actually reads it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team&lt;/strong&gt; — The middle layer. Your team’s specializations. Maybe your frontend team has specific React patterns. Your data team has dbt conventions. Your platform team has infrastructure standards. These inherit from Department but add domain-specific knowledge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project&lt;/strong&gt; — The bottom layer. Local context for a specific repo. The stuff that only matters here. Your project’s architecture decisions, custom tooling, specific quirks.&lt;/p&gt;

&lt;p&gt;In practice, this lives in a monorepo where each layer is a directory containing &lt;strong&gt;virtual subdirectory packages&lt;/strong&gt;. So you might have:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/department/standards-security
/department/standards-code-review
/team/frontend-react
/team/data-engineering

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each of those is a standalone APM package that can be versioned and depended on independently, but they all live in one repo where you can see the whole picture. You slap &lt;code&gt;CODEOWNERS&lt;/code&gt; on the department folders so nobody changes the security standards without review, but teams get autonomy over their own specializations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://microsoft.github.io/apm/guides/org-packages/" rel="noopener noreferrer"&gt;Org-Wide Packages Pattern&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving Context Pollution with Intelligent Compilation
&lt;/h2&gt;

&lt;p&gt;Here’s a problem I didn’t anticipate until I was neck-deep in it: &lt;strong&gt;context pollution&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You’ve got department-level instructions. Team-level instructions. Project-level instructions. Skills from three different packages. Prompts from two more. And now your AI assistant is trying to load all of that into a context window that is not infinite. Irrelevant instructions don’t just waste tokens and degrade performance. Tell an AI too many things and it starts forgetting the important ones.&lt;/p&gt;

&lt;p&gt;APM solves this with &lt;code&gt;apm compile&lt;/code&gt;, which transforms all your scattered primitives into optimized, hierarchical &lt;code&gt;AGENTS.md&lt;/code&gt; files. It figures out which instructions belong at which level and how to structure them so the AI gets the most relevant context first.&lt;/p&gt;

&lt;p&gt;The conflict resolution model is opinionated: &lt;strong&gt;local project files always win&lt;/strong&gt;. If your project has an &lt;code&gt;AGENTS.md&lt;/code&gt; and an installed package also has one, &lt;code&gt;apm install&lt;/code&gt; skips the existing file unless you explicitly &lt;code&gt;--force&lt;/code&gt; it. During &lt;code&gt;apm compile&lt;/code&gt;, instructions get merged intelligently based on file patterns, but your local overrides stay on top. This is the right call. The project knows itself better than any upstream package does.&lt;/p&gt;

&lt;p&gt;I found this out the hard way, naturally. I had a package that defined broad coding standards and a project that had specific exceptions. Without the compile step, the AI was getting contradictory instructions and doing that thing where it apologizes and asks which rule you’d prefer it follow. With &lt;code&gt;apm compile&lt;/code&gt;, the hierarchy is built in. The AI just does the right thing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://microsoft.github.io/apm/guides/compilation/" rel="noopener noreferrer"&gt;Compilation &amp;amp; Optimization Guide&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Automating the Standards
&lt;/h2&gt;

&lt;p&gt;Once you have a pattern, you need a way to use without having to explain it every time. So I built &lt;code&gt;util-apm-builder&lt;/code&gt;: a meta-skill that helps scaffold new packages. Yes, I used an AI tool to build a tool that teaches AI tools how to use AI tools.&lt;/p&gt;

&lt;p&gt;Building this taught me something important about how AI skills actually can be structured. I do have skills that consist of a single &lt;code&gt;SKILL.md&lt;/code&gt;. They just describe what the skill doe and have a couple of examples. I have created more advanced skill with workflows and references too. I was used to Claude Code.&lt;/p&gt;

&lt;p&gt;But I was in GitHub Copilot world now and the structure it built for the first skill was really interesting:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Instructions&lt;/strong&gt; — Guardrails about the monorepo’s directory structure. “Department packages go here. Team packages go there. Don’t create folders outside this hierarchy.” Without these, the AI will hallucinate creative new locations for things.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context&lt;/strong&gt; — The technical knowledge base. Manifest schemas. Valid field values. What &lt;code&gt;apm.yml&lt;/code&gt; actually accepts. This is the reference material the AI consults mid-task, and without it, you get manifests that look right but fail validation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts&lt;/strong&gt; — The executable task template. “Create a new package” with defined inputs for name, layer, type. This is what the developer actually triggers.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A &lt;code&gt;SKILL.md&lt;/code&gt; at the root makes it a hybrid package: part skill, part instruction set.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://microsoft.github.io/apm/guides/agent-workflows/" rel="noopener noreferrer"&gt;Agent Workflows (Experimental)&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Local Iteration and the Playground Strategy
&lt;/h2&gt;

&lt;p&gt;Let me save you from a mistake I made so you can make different, more interesting mistakes.&lt;/p&gt;

&lt;p&gt;Do not install APM packages at the root of your monorepo during development. I did this. What happens is the AI discovers your package source files in &lt;code&gt;/department/my-package/&lt;/code&gt; AND the deployed copies in &lt;code&gt;apm_modules/&lt;/code&gt; and &lt;code&gt;.claude/&lt;/code&gt;, and now it’s seeing the same instructions twice from two different locations. It doesn’t know which is authoritative. It gets confused. You get confused. Everyone’s confused. It’s a bad time.&lt;/p&gt;

&lt;p&gt;The fix is stupidly simple: create a &lt;code&gt;/local&lt;/code&gt; folder as your playground. It’s a separate workspace where you install packages using relative path dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# /local/apm.yml&lt;/span&gt;
&lt;span class="na"&gt;dependencies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;../department/util-apm-builder&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;../team/frontend-react&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you fast iteration without pushing to a remote registry, and it keeps the source packages and the deployed copies in separate directory trees so the AI doesn’t see double.&lt;/p&gt;

&lt;p&gt;One gotcha: VS Code only discovers skills at the root of an open workspace. So if you’re testing a new skill in &lt;code&gt;/local&lt;/code&gt;, you need to actually open that folder in VS Code, or set up a multi-root workspace that includes it. I spent ten minutes wondering why my skill wasn’t showing up before I figured this out.&lt;/p&gt;

&lt;p&gt;For git discipline: add &lt;code&gt;apm_modules/&lt;/code&gt; to your &lt;code&gt;.gitignore&lt;/code&gt; (it’s like &lt;code&gt;node_modules/&lt;/code&gt;, derived, not source), but commit &lt;code&gt;apm.lock.yaml&lt;/code&gt; and the deployed primitives in &lt;code&gt;.github/&lt;/code&gt; and &lt;code&gt;.claude/&lt;/code&gt;. The lock file ensures reproducibility. The deployed files ensure any developer who clones the repo gets the same AI context without needing to run &lt;code&gt;apm install&lt;/code&gt; first.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://microsoft.github.io/apm/guides/dependencies/" rel="noopener noreferrer"&gt;Dependencies &amp;amp; Lockfile Guide&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  From Supervisor to Architect
&lt;/h2&gt;

&lt;p&gt;There’s a maturity curve to working with AI coding assistants, and most of us are stuck somewhere in the middle of it.&lt;/p&gt;

&lt;p&gt;At the beginning, you’re a &lt;strong&gt;Supervisor&lt;/strong&gt;. You watch every line the AI writes. You correct it constantly. You paste errors back into the chat. You basically do pair programming where your partner has amnesia and you’re doing all the navigating.&lt;/p&gt;

&lt;p&gt;The next level is what I’ve been doing for a while: running multiple AI tools on multiple projects simultaneously, trusting them with larger chunks of work, letting them plan and execute while I review the output. It’s better, but it’s still reactive. You’re managing agents, not engineering systems.&lt;/p&gt;

&lt;p&gt;What APM enables is the jump to &lt;strong&gt;Architect&lt;/strong&gt;. You define the standards, the guardrails, the knowledge hierarchy, and the execution patterns once. You version them. You distribute them. And then every AI assistant that touches any project in your ecosystem automatically knows how to behave, what standards to follow, and what context matters. You stop supervising individual interactions and start engineering the environment those interactions happen in.&lt;/p&gt;

&lt;p&gt;The best part is the escape hatch. “Back in the day” in AI terms is last month. Who knows what will change. APM’s output is native configuration files: &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;.cursor/rules&lt;/code&gt;, skill definitions. If APM disappears tomorrow, or you decide it’s not for you, those files keep working. You haven’t locked yourself into anything except having better-organized AI context, which is not exactly a downside.&lt;/p&gt;

</description>
      <category>aiassisteddevelopmen</category>
    </item>
    <item>
      <title>I Burned Out on Vibe Coding, Came Back, and Rewrote Everything</title>
      <dc:creator>Stephan Miller</dc:creator>
      <pubDate>Sun, 08 Feb 2026 07:00:00 +0000</pubDate>
      <link>https://forem.com/eristoddle/i-burned-out-on-vibe-coding-came-back-and-rewrote-everything-l6i</link>
      <guid>https://forem.com/eristoddle/i-burned-out-on-vibe-coding-came-back-and-rewrote-everything-l6i</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zne0f49kpebtq4tpg8g.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zne0f49kpebtq4tpg8g.jpg" alt="AI-assisted development" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I hit a wall with vibe coding. Not a dramatic crash. More like the slow realization that I’d been sprinting for months and couldn’t remember why. I had 15 projects in various states of “maybe done,” a GitHub commit chart that looked like a heart monitor, and a growing suspicion that I was building things just to build things.&lt;/p&gt;

&lt;p&gt;Fortunately, freelance writing work picked up right around the same time. Enough to actually pay attention to it. So I stepped away from the side projects, wrote about other people’s technology for a change, and let my own code sit untouched for a few months.&lt;/p&gt;

&lt;p&gt;When I came back, I had no patience for bullshit. And I looked at my projects differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Your Vibe-Coded Apps Are Prototypes (And That’s Fine)&lt;/li&gt;
&lt;li&gt;Making “Adding Features” the Feature&lt;/li&gt;
&lt;li&gt;Building Bottom-Up with Verdent and Claude Code&lt;/li&gt;
&lt;li&gt;The 60 Missing APIs&lt;/li&gt;
&lt;li&gt;Making Plans That Any AI Agent Can Execute&lt;/li&gt;
&lt;li&gt;The Same Pattern, Different Project&lt;/li&gt;
&lt;li&gt;What Changed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Your Vibe-Coded Apps Are Prototypes (And That’s Fine)
&lt;/h2&gt;

&lt;p&gt;Here’s the thing I couldn’t see while I was in the thick of it: almost everything I’d built with AI coding tools was a prototype. Not in the dismissive sense. These apps worked. &lt;a href="https://dev.to/eristoddle/building-an-electron-app-from-scratch-with-claude-code-5c03"&gt;EmberText&lt;/a&gt; was a functional Electron writing app. Niche Site Factory could generate and manage content sites. They ran. They did things.&lt;/p&gt;

&lt;p&gt;But they were all built top-down. I’d tell the AI “build me an app that does X” and it would scaffold the whole thing, features and all, in one giant session. The problem is that when you build top-down with AI, you end up with something that works but is almost impossible to extend. Every new feature is a negotiation with the existing architecture. You’re not adding to the app. You’re fighting it.&lt;/p&gt;

&lt;p&gt;EmberText was the clearest example. I built it with Claude Code over about 16 hours and $80 in API costs. It had AI integration, text generation, character relationship graphs, plot scaffolding. Impressive on paper. But by the time I realized it should have had a plugin architecture, I was already deep enough that refactoring meant essentially starting over.&lt;/p&gt;

&lt;p&gt;So that’s what I did.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making “Adding Features” the Feature
&lt;/h2&gt;

&lt;p&gt;The insight that changed everything was stupid simple: instead of building an app with features, build an app where adding features &lt;em&gt;is&lt;/em&gt; the feature.&lt;/p&gt;

&lt;p&gt;I’d been using Obsidian for years and it’s in my top 5 favorite software. It’s incredible for notes, planning, and organization. You can even make it distraction-free for writing. It’s just not the default, and “not the default” matters more than you’d think when you’re trying to get into a flow state. I tried to hack around this with my Daily Prompts plugin that launched an alert and opened a daily note in Zen mode. It worked, kind of, but I was still fighting the tool.&lt;/p&gt;

&lt;p&gt;VS Code is for code. Obsidian is for notes. What’s for writing?&lt;/p&gt;

&lt;p&gt;That question led to Veneer, a complete rewrite of EmberText from scratch. Same idea, a distraction-free writing environment, but built from the ground up as a plugin-first architecture. The “Zen-First Shell” concept: when you open it, you see nothing but a clean sheet and your text. Sidebars, ribbons, status bars exist as ghost elements, hidden by default, appearing only when you hover near the edges or hit a hotkey. Everything that isn’t the writing surface has to earn its right to be on screen.&lt;/p&gt;

&lt;p&gt;And critically, every feature is a plugin. The file explorer? Plugin. The markdown editor? Plugin. The command palette? Plugin. Even core functionality ships as plugins that can be swapped, extended, or replaced. This isn’t just for a future community. It makes the whole thing dramatically easier to build with AI, because each plugin is a self-contained unit with clear boundaries. You can hand an AI agent a plugin spec and let it work without worrying about it breaking everything else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Bottom-Up with Verdent and Claude Code
&lt;/h2&gt;

&lt;p&gt;I used &lt;a href="https://www.verdent.ai/" rel="noopener noreferrer"&gt;Verdent&lt;/a&gt; to build the base application. If you read &lt;a href="https://dev.to/eristoddle/verdent-ai-when-your-ai-coding-assistant-finishes-before-you-can-get-coffee-4e76"&gt;my post about Verdent&lt;/a&gt;, you know this thing is fast. Too fast, honestly. It finished most of the base app, including a file browser sidebar plugin, a markdown editor plugin, and a command palette, in about 220 tokens, roughly $20 worth of credits. There were bugs left when I ran out of tokens, but the foundation was solid.&lt;/p&gt;

&lt;p&gt;But here’s where the process got interesting. Instead of just continuing to add features on top, I switched to Claude Code and did something I hadn’t done before: I asked it to &lt;em&gt;audit&lt;/em&gt; the codebase.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;use your skills and check the repo for best practices
- UI
- Is it themable like Obsidian or VS Code
- Plugin Architecture (and compare to VS Code and Obsidian)
- TypeScript
- Electron
- Structure, Naming Conventions

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I’d forgotten how many skills and plugins I had installed in Claude Code. When I ran this, it deployed four specialized agents in parallel:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Explore subagent&lt;/strong&gt; analyzed the overall project structure, UI patterns, theming, and naming conventions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture Strategist&lt;/strong&gt; evaluated system design decisions and compared the plugin architecture against VS Code and Obsidian&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kieran TypeScript Reviewer&lt;/strong&gt; checked strict mode compliance, type safety, interface definitions, and generic patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best Practices Researcher&lt;/strong&gt; gathered industry standards and found examples from successful projects&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is not how I was working six months ago. Six months ago, I would have just told the AI to add the next feature and hoped for the best.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 60 Missing APIs
&lt;/h2&gt;

&lt;p&gt;The audit turned up a lot. Claude gave the codebase an A- (92/100) overall, which sounds great until you read the details. The critical finding was the plugin API gaps. Obsidian provides 60+ plugin APIs. Veneer was missing most of them.&lt;/p&gt;

&lt;p&gt;No modals. No notification system. No context menus anywhere. No way for plugins to subscribe to file or workspace events. No way to extend the CodeMirror editor. The native OS menu had “Open Folder” under “Veneer” instead of “File,” which is the kind of thing that makes you realize the AI built the structure but didn’t think about the conventions.&lt;/p&gt;

&lt;p&gt;I had Claude store all the findings in the project’s docs folder:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BEST_PRACTICES_REVIEW.md&lt;/strong&gt; : Everything organized by priority with an implementation roadmap&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PLUGIN_API_GAPS.md&lt;/strong&gt; : A detailed comparison against Obsidian and VS Code showing exactly what was missing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Making Plans That Any AI Agent Can Execute
&lt;/h2&gt;

&lt;p&gt;This is the part that parallels &lt;a href="https://mitchellh.com/writing/my-ai-adoption-journey" rel="noopener noreferrer"&gt;Mitchell Hashimoto's AI adoption journey&lt;/a&gt;. He talks about “harness engineering,” the idea that every time an agent makes a mistake, you engineer a solution so it never makes that mistake again. Better implicit prompting. Actual programmed tools. The goal is building up an ecosystem where agents get better over time.&lt;/p&gt;

&lt;p&gt;I’m doing something similar, but at the project planning level. Instead of just fixing bugs as they come, I’m creating structured documentation that any AI tool can pick up and execute. My next prompt to Claude was:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Take docs/BEST_PRACTICES_REVIEW.md and docs/PLUGIN_API_GAPS.md and create a
markdown list of TODOs in the docs folder. These should be grouped into tasks
and subtasks. If it is possible to work on some tasks concurrently this should
be mentioned. This file should be able to be used by an AI agent to finish
these tasks. Add enough details to each task to speed up development time.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now I have the work planned in 3 phases across 3 TODO files, plus a final phase listing every Obsidian plugin API that Veneer doesn’t have yet for future development. These are all in the docs folder of the project, version controlled, and written so that any agent, Claude Code, Jules, a VS Code extension with Qwen, whatever, can pick them up and start working.&lt;/p&gt;

&lt;p&gt;This is the difference between vibe coding and what I’m doing now. I’m still using AI to do the heavy lifting. But I’m not just throwing prompts at the wall. I’m using one AI tool to build, another to audit, and then creating structured plans that decouple the &lt;em&gt;what needs to happen&lt;/em&gt; from the &lt;em&gt;which tool does it&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Same Pattern, Different Project
&lt;/h2&gt;

&lt;p&gt;This isn’t just how I rebuilt Veneer. I’m doing the same thing with Niche Site Factory. Instead of telling an AI to “build me a niche site generator” (which is roughly what I did the first time), I started over by building the data model first.&lt;/p&gt;

&lt;p&gt;I took a real project, a sci-fi encyclopedia wiki, and used it to design the content structures. A knowledge graph in PostgreSQL with pgvector for embeddings. 2,622 books ingested into the entities table. Flexible JSONB storage that can handle books, concepts, authors, movies, whatever. The data model came first, the application came second.&lt;/p&gt;

&lt;p&gt;It’s the same bottom-up principle. Don’t build the house and then figure out the foundation. Build the foundation, verify it’s solid, then build up from there.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;p&gt;I think the burnout was actually useful. Stepping away let me see the pattern I was stuck in: build fast, hit a wall, start something new. That’s fine when you’re learning the tools. It’s how I figured out what Claude Code, Kiro, Verdent, and Jules are each good at. But at some point, you have to stop prototyping and start building.&lt;/p&gt;

&lt;p&gt;Here’s what’s different now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bottom-up, not top-down.&lt;/strong&gt; Start with the architecture and data model, not the features. Let the AI build on a solid foundation instead of improvising one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit before extending.&lt;/strong&gt; Use AI review tools to find the gaps before you pile on more code. It’s cheaper to fix the structure now than refactor later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plans as portable artifacts.&lt;/strong&gt; Write TODO files detailed enough that any AI agent can execute them. Don’t marry yourself to one tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugins as a development strategy.&lt;/strong&gt; A plugin architecture isn’t just for the community. It makes AI-assisted development dramatically easier because each unit is self-contained.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Past work is research.&lt;/strong&gt; EmberText wasn’t a failure. It was a $80 prototype that taught me exactly what Veneer needed to be.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m still &lt;a href="https://www.stephanmiller.com/category/vibe-coding/" rel="noopener noreferrer"&gt;vibe coding&lt;/a&gt;. I’m just vibing with more structure now and calling it &lt;a href="https://www.stephanmiller.com/category/ai-assisted-development/" rel="noopener noreferrer"&gt;AI-assisted development&lt;/a&gt;. And honestly, after a few months of writing for clients and not touching my own projects, coming back to this with fresh eyes and no patience might be the best thing that happened to any of them.&lt;/p&gt;

</description>
      <category>aiassisteddevelopmen</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>Verdent AI - When Your AI Coding Assistant Finishes Before You Can Get Coffee</title>
      <dc:creator>Stephan Miller</dc:creator>
      <pubDate>Tue, 23 Sep 2025 13:00:00 +0000</pubDate>
      <link>https://forem.com/eristoddle/verdent-ai-when-your-ai-coding-assistant-finishes-before-you-can-get-coffee-4e76</link>
      <guid>https://forem.com/eristoddle/verdent-ai-when-your-ai-coding-assistant-finishes-before-you-can-get-coffee-4e76</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewhtymj27nrppg1jmcu1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewhtymj27nrppg1jmcu1.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My current AI development process, if you want to call it that, is getting one AI tool working on one project while having another work on a different project. This only works if one or both projects aren’t near the end where I have to test and have AI fix a lot of things. This process grew out of the fact that sometimes it takes AI a little while to get a task done, but not enough time for you to get any real work done yourself. So it was either work on another project or scroll through Reddit.&lt;/p&gt;

&lt;p&gt;But Verdent put a kink in that plan.&lt;/p&gt;

&lt;p&gt;I needed something for Verdent to work on. When I had &lt;a href="https://dev.to/eristoddle/how-i-built-two-obsidian-plugins-while-kiro-ai-did-most-of-the-work-40e4"&gt;Kiro build Obsidian plugins&lt;/a&gt;, one which was relatively simple, it created over a dozen tasks and took anywhere from two to four hours for each plugin. So I figured a couple of plugins would be enough work for a Saturday afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Projects: What I Threw at Verdent
&lt;/h2&gt;

&lt;p&gt;I had two ideas picked out to build and I used Auto Run mode to build both:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Obsidian Cleaner Plugin&lt;/strong&gt; : A “cleaner” plugin that checks that attachments in the Obsidian attachment folder and provide you with a checkbox list of all those that aren’t linked so you can delete them. It does the same thing with conflicted files. If I find any more common things I clean like this, I will add them later.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59g8hfv91i3o4qi4cced.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59g8hfv91i3o4qi4cced.png" alt="Obsidian Cleaner Plugin" width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3D Tag Explorer Plugin&lt;/strong&gt; : A plugin that takes hierarchical tags like &lt;code&gt;llm/writing/software&lt;/code&gt; and turns them into a 3D node graph with notes containing those tags included as the final nodes.&lt;/p&gt;

&lt;p&gt;Pretty straightforward stuff. I figured these would keep Verdent busy while I worked on something else.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9dsumfm5fm9weqk6g0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9dsumfm5fm9weqk6g0z.png" alt="Obsidian 3D Tag Explorer Plugin" width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reality Check: When Hours Becomes Minutes
&lt;/h2&gt;

&lt;p&gt;Verdent finished both of these Obsidian plugins in less than 15 minutes each.&lt;/p&gt;

&lt;p&gt;Now these plugins were relatively simple, but I did not expect that.&lt;/p&gt;

&lt;p&gt;There was one bug in the tag explorer where the background was above the node graph. I mentioned it to Verdent and it fixed it quickly. The cleaner plugin just worked on first try.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs8ys3fzpgpjba1103doc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs8ys3fzpgpjba1103doc.jpg" alt="Verdent Chat Window" width="800" height="1492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So now I’m sitting here at 10:30 AM on a Saturday with two working plugins and I really wanted to focus on another task while Verdent just did its thing. This is not the kind of problem I expected to have with AI coding tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Panic Building: More Projects to Feed the Beast
&lt;/h2&gt;

&lt;p&gt;I had to scramble to find more work for Verdent to do. Here’s what I threw at it next:&lt;/p&gt;

&lt;h3&gt;
  
  
  BookForge
&lt;/h3&gt;

&lt;p&gt;A web app, API, command line app, and library that converts markdown files into simple epubs. This was simple but I bet it took less than 30 minutes. I really did not trust it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbd7rz4f8i62menl7w8qa.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbd7rz4f8i62menl7w8qa.jpg" alt="BookForge Done In One Commit" width="800" height="298"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It was essentially done with the first commit and worked well enough to be the MVP. One commit. For a complete application with multiple interfaces. What the hell is happening to software development? I have made a total of eight commits to the repo. The rest were to add docker, make slight modifications to the text in the web app, and create a release script so I can use it as a library in other projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  PromptOS
&lt;/h3&gt;

&lt;p&gt;A service, libraries, and extensions to store prompts and other text, markdown, and JSON instruction files for AI. This was the most complex of the four projects. Verdent broke it into 4 phases. After each phase I did a commit and approved it to start on the next phase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Site Factory
&lt;/h3&gt;

&lt;p&gt;A project for building pre-configured Gatsby sites quickly. I’m still unsure of the architecture of this one and over-architected it twice. This teaches me not to use tools I haven’t used before and have AI build the project at the same time. I started over again after reading documentation on Gatsby and the other tools I planned to use to get a better idea of what I actually wanted.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mostly Static
&lt;/h3&gt;

&lt;p&gt;A set of services and dashboard for static sites. I threw this in at the last minute to put the last bit of my generous beta access to work. This was multiple phases also. But that Saturday, Verdent built two Obisidian plugins and four applications in a few hours. And did not use up the 2000 credits I got for the day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verdent Features: The Good Stuff That Actually Works
&lt;/h2&gt;

&lt;p&gt;I didn’t test Verdent Deck because my personal Mac is still an Intel one and apparently I’m living in the stone age of computing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plan Mode
&lt;/h3&gt;

&lt;p&gt;Verdent’s plan mode is where things get interesting. It will have you approve the plan before it starts building. If you’re in Auto Run mode, it will just run until it’s done with the project, though it did stop and ask about certain commands that might be destructive. Or you can go directly to Skip Permissions mode and have it stop asking question. I stuck to the middle road and only had to tell it to continue a couple of times.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcl6ottz95dx7512vvodr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcl6ottz95dx7512vvodr.jpg" alt="Verdent Plan Mode" width="664" height="1260"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If your directions are vague, it will ask you a series of questions to help ensure it builds what you are expecting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxpyjk83543l4m7fw8hw7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxpyjk83543l4m7fw8hw7.jpg" alt="Verdent Follow Up Questions" width="612" height="740"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For larger projects, it breaks work into phases and tells you when each is done. PromptOS was the most complex in that it had an API, website, desktop app, and extensions for two applications. Verdent broke that into 4 phases automatically. Another project I started building after this set was broken in 11 phases.&lt;/p&gt;

&lt;p&gt;You may want to tell it to save the plan to the project docs folder, so you can keep it in version control for reference, since it doesn’t do that automatically. In the newest version, you can copy the content of the plan from the chat and save it to the project if you want.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6q192qx09msn60z6xfd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6q192qx09msn60z6xfd.jpg" alt="Verdent Project Plan" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The other features, to tell you the truth, I haven’t touched yet, because I simply didn’t need to.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rules System
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2bhb9hfz6c9t8iu0aml.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2bhb9hfz6c9t8iu0aml.jpg" alt="Verdent Rules" width="644" height="1182"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The rules system lets you set custom instructions that persist across projects. This is useful for coding standards, preferred libraries, or just telling it not to do stupid shit that you’ve seen it do before.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subagents
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gpz3s7xri273k1r2d2w.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gpz3s7xri273k1r2d2w.jpg" alt="Verdent Subagents" width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Verdent uses specialized subagents for different types of work. You don’t have to think about this much - it just routes tasks to the right AI worker automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP Support
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F612pm1umov4wech9tqcq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F612pm1umov4wech9tqcq.jpg" alt="Verdent MCP Support" width="670" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Model Context Protocol support means Verdent can integrate with other tools and services. This is probably more useful than I realize, but I haven’t had time to explore it fully given how fast everything else has been happening.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pricing Reality Check
&lt;/h2&gt;

&lt;p&gt;During the beta, I got 2000 credits a day. These were hard to use up even when I was actively trying to burn through them. Once the beta was over, I received a bucket of credits. Right now, I’m testing how long these credits last to determine how I’ll use Verdent in the future.&lt;/p&gt;

&lt;p&gt;The pricing tiers are reasonable for what you get. If you’re doing any serious development work, the cost of the tool becomes insignificant compared to the time it saves. But I’m still figuring out my usage patterns before committing to a subscription.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict: Too Fast and Good to Ignore
&lt;/h2&gt;

&lt;p&gt;I was definitely happy with the results. During the beta, getting through 2000 credits in a day required serious effort. The speed and quality of the output is genuinely impressive.&lt;/p&gt;

&lt;p&gt;Because I will be using it in the future. It is just too fast and good not to. Right now the only AI subscription I have is Claude Pro, so I use Claude Code mainly. But I also have API accounts at most of the big AI companies. So I might just pay for credits as I go for a while until my usage is consistent and then pick up a subscription.&lt;/p&gt;

&lt;p&gt;The multi-AI workflow is becoming essential. When one tool is thinking, another can be building. When you have AI assistants that can complete substantial projects in under 30 minutes, the bottleneck becomes your ability to feed them work, not their ability to do it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: The Future of Lazy Coding
&lt;/h2&gt;

&lt;p&gt;Verdent actually delivered on its promises. The difference between Verdent and some other AI coding tools I’ve used is the speed, the fact that it rarely gets confused about what I’m asking it to do, and, even though I was dreading testing the apps it built, they had less bugs and weirdness than I expected.&lt;/p&gt;

&lt;p&gt;We’re at the point where AI coding assistants are good enough that the economics start to make sense for most developers. When something can build a complete application in 30 minutes, you find a way to afford the monthly subscription.&lt;/p&gt;

&lt;p&gt;The real challenge now isn’t getting AI to write code. It’s keeping up with how fast it can work and making sure you’re feeding it projects that are actually worth building. But honestly, that’s a good problem to have.&lt;/p&gt;

&lt;p&gt;I’m still figuring out the economics of AI-assisted development, but when something works this well, you adapt. The alternative is falling behind while other developers are shipping software at warp speed.&lt;/p&gt;

&lt;p&gt;And if you’re still writing everything by hand while AI tools like Verdent exist, well, you might want to reconsider your approach. The future of coding is here, and it’s fast enough to finish your weekend projects before lunch.&lt;/p&gt;

&lt;p&gt;You can learn more about Verdent &lt;a href="https://verdent.ai/" rel="noopener noreferrer"&gt;here&lt;/a&gt; and find the VS code plugin or Verdent Deck &lt;a href="https://www.verdent.ai/download" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>verdent</category>
    </item>
    <item>
      <title>The Great Vibe Coding Experiment - How I Built 15 Projects with AI in My Spare Time</title>
      <dc:creator>Stephan Miller</dc:creator>
      <pubDate>Mon, 15 Sep 2025 07:00:00 +0000</pubDate>
      <link>https://forem.com/eristoddle/the-great-vibe-coding-experiment-how-i-built-15-projects-with-ai-in-my-spare-time-275o</link>
      <guid>https://forem.com/eristoddle/the-great-vibe-coding-experiment-how-i-built-15-projects-with-ai-in-my-spare-time-275o</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjcl7xf3z1genj6bxtrld.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjcl7xf3z1genj6bxtrld.png" alt="Vibe Coding" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I started this whole thing wanting to test Claude Desktop with MCPs. Just one little experiment. You know how that goes.&lt;/p&gt;

&lt;p&gt;Six months later, I’ve got 15 projects in various states of “done” and a GitHub commit chart that doesn’t look too crazy until you realize that most days I only have a couple of hours to experiment with this, when I do have time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F29gdtyt1i22s4gj3orqi.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F29gdtyt1i22s4gj3orqi.jpg" alt="Github Commit Chart" width="800" height="150"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Welcome to “vibe coding”: building shit because it feels right and letting AI do most of the heavy lifting. It’s not agile development. It’s not waterfall. And as I’ve done more of it, it has become let chaotic and more stuctured. Most nights I work on two projects simultaneously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;When Vibe Coding Becomes Automated Spec-Driven Development&lt;/li&gt;
&lt;li&gt;
The Obsidian Plugin Empire

&lt;ul&gt;
&lt;li&gt;Apple Books Highlights Plugin: Hacking SQLite&lt;/li&gt;
&lt;li&gt;Joplin Portal: My First Test of Kiro&lt;/li&gt;
&lt;li&gt;Daily Note Prompts: An Extension I Am Using&lt;/li&gt;
&lt;li&gt;Tag Explorer 3D: Testing a Shiny New Tool&lt;/li&gt;
&lt;li&gt;Attachment Cleaner: Simple But Necessary&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

The Bigger Fish: Apps That Do Real Shit

&lt;ul&gt;
&lt;li&gt;Gatsby Site with Scraper: The Abandoned First Project&lt;/li&gt;
&lt;li&gt;GitWrite: When Jules Met Agentic Project Manager&lt;/li&gt;
&lt;li&gt;EmberText: My First Claude Code Experiment&lt;/li&gt;
&lt;li&gt;MDQuery: Fed Up with Proprietary Tools&lt;/li&gt;
&lt;li&gt;AutoVibe: The Meta Project&lt;/li&gt;
&lt;li&gt;AutoVibe Template: The Stop-Gap Solution&lt;/li&gt;
&lt;li&gt;ShopBoth: Testing the Template (and Learning Hard Lessons)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Tool Whose Name Shall Not Be Spoken&lt;/li&gt;

&lt;li&gt;How These Projects Work Together&lt;/li&gt;

&lt;li&gt;

What I Learned About AI-Assisted Development

&lt;ul&gt;
&lt;li&gt;Each AI Tool Has Its Own Personality&lt;/li&gt;
&lt;li&gt;The Process That Emerged&lt;/li&gt;
&lt;li&gt;Why 15 Projects Makes Sense (Sort Of)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;What’s Next: The AI Development Army&lt;/li&gt;

&lt;li&gt;Conclusion: Embrace the Chaos&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  When Vibe Coding Becomes Automated Spec-Driven Development
&lt;/h2&gt;

&lt;p&gt;Vibe coding is when you have an idea, fire up an AI coding assistant, and see what happens. No detailed specs. No project management software. Just “hey AI, build me a thing that does X” and then iterating until it works or you get distracted by building something else.&lt;/p&gt;

&lt;p&gt;But there was a reason I abandoned my first vibe coding project. I just added random features to an idea that I can up with on the fly and thought maybe ten minutes about. I just wanted to see what it could do. But the second project, I knew I needed some kind of rails.&lt;/p&gt;

&lt;p&gt;The next step was to get out of this loop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tell the AI tool to create the feature&lt;/li&gt;
&lt;li&gt;Test the result and tell the AI tool to fix the errors and failed tests&lt;/li&gt;
&lt;li&gt;Maybe do that again or a few more times, copying and pasting errors&lt;/li&gt;
&lt;li&gt;Ask the AI tool how to prevent this from happening with a better prompt&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And realize it could be a self-optimizing process of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Actor: Have an agent write the code&lt;/li&gt;
&lt;li&gt;Auditor: Have an agent review the code and run the tests (multiple types of auditor) and either pass or fail and send back to Actor&lt;/li&gt;
&lt;li&gt;Process Improver: Have an agent examine the steps that caused the failed process and update the commands, agent definitions, or other project docs to prevent them in the future&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All because you can trust AI to do what you thought you told it to do. And this trail of projects documents my journey towards that goal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Obsidian Plugin Empire
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Apple Books Highlights Plugin: Hacking SQLite
&lt;/h3&gt;

&lt;p&gt;I built this &lt;a href="https://github.com/eristoddle/apple-books-annotation-import" rel="noopener noreferrer"&gt;plugin&lt;/a&gt; with Claude Desktop and MCPs, documented the whole process in &lt;a href="https://dev.to/eristoddle/creating-an-obsidian-plugin-with-claude-ai-gaj"&gt;this vibe coding post&lt;/a&gt;. The plugin actually works. I use it daily. It extracts annotations from the macOS Books SQLite database and creates formatted markdown notes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftrfsvm2cqtxensrldr32.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftrfsvm2cqtxensrldr32.png" alt="Apple Book Annotation Import" width="800" height="957"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Joplin Portal: My First Test of Kiro
&lt;/h3&gt;

&lt;p&gt;I had all these notes in Joplin that I wanted to access from Obsidian. So I fired up Kiro AI and told it to build me &lt;a href="https://github.com/eristoddle/joplin-portal" rel="noopener noreferrer"&gt;https://github.com/eristoddle/joplin-portal&lt;/a&gt;. &lt;a href="https://dev.to/eristoddle/how-i-built-two-obsidian-plugins-while-kiro-ai-did-most-of-the-work-40e4"&gt;Kiro did most of the work&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2vl52sfhg1ktis4ahcc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2vl52sfhg1ktis4ahcc.jpg" alt="Joplin Portal Sidebar" width="800" height="1391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The surprising thing about letting AI write plugins is that they actually follow best practices better than I do. Kiro created proper TypeScript interfaces, handled errors gracefully, and even added settings panels I didn’t ask for.&lt;/p&gt;

&lt;h3&gt;
  
  
  Daily Note Prompts: An Extension I Am Using
&lt;/h3&gt;

&lt;p&gt;Another Kiro collaboration. This one adds customizable prompts to daily notes. I actually use this, which is more than I can say for many of the plugins I’ve tried.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fow12n4pj9ims4uwqae9h.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fow12n4pj9ims4uwqae9h.jpg" alt="Obsidian Daily Prompt Settings" width="800" height="944"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It still needs some work, but it works for me for now. I’m just keeping a running list of changes I want to make to it and bugs I’ve run into and one of these days, I’ll put the list in one of these AI coding tools and set it to work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tag Explorer 3D: Testing a Shiny New Tool
&lt;/h3&gt;

&lt;p&gt;I started this project the day after making my list of existing projects. Why? Because I wanted to test a new AI tool (can’t name it yet) and needed something to build.&lt;/p&gt;

&lt;p&gt;Tag Explorer 3D visualizes your Obsidian tags and notes with those tags in 3D space. Is it necessary? Probably not. Is it cool? Absolutely. Sometimes you build things because they’re interesting, not because they solve real problems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9dsumfm5fm9weqk6g0z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9dsumfm5fm9weqk6g0z.png" alt="Obsidian Tag Explorer 3D" width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Attachment Cleaner: Simple But Necessary
&lt;/h3&gt;

&lt;p&gt;Really simple plugin built with the same unnamed AI tool. It finds and removes unused attachments from your vault. I keep blog posts drafts there and paste images in, which gets stored there and they get forgotten when I move the draft.&lt;/p&gt;

&lt;p&gt;I need to test it more, but the code looks solid. It’s often better at the simple, boring stuff than the complex, interesting stuff if you want AI coding to work. And I may turn it into a general clean up tool, because I am also tired of tracking down conflicted files.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59g8hfv91i3o4qi4cced.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59g8hfv91i3o4qi4cced.png" alt="Obsidian Attachment Cleaner" width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Fish: Apps That Do Real Shit
&lt;/h2&gt;

&lt;p&gt;Plugins are fun, but eventually you want to build something more substantial. That’s where things got interesting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gatsby Site with Scraper: The Abandoned First Project
&lt;/h3&gt;

&lt;p&gt;This was my &lt;a href="https://dev.to/eristoddle/claude-mcps-vibe-coding-without-specialized-ides-part-1-1hmd"&gt;first attempt at vibe coding&lt;/a&gt;. I wanted to build a Gatsby site with an integrated scraper using Claude Desktop and MCPs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi40ktvl05t4hj1wf4rhm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi40ktvl05t4hj1wf4rhm.png" alt="AI Generated Gatsby Site" width="800" height="1110"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I abandoned it. Not because it didn’t work, but because I realized I was building it the wrong way. Sometimes the most important decision is knowing when to stop.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitWrite: When Jules Met Agentic Project Manager
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/eristoddle/git-write" rel="noopener noreferrer"&gt;GitWrite&lt;/a&gt; is an abstraction over git for writers, editors, and beta readers. I built it with Jules AI, used an Agentic Project Manager to coordinate, and finished it up with Qoder. Well, it said it was finished and I am still working my way around to testing it’s full functionality. Just made the mistake of finishing it before I needed it.&lt;/p&gt;

&lt;p&gt;It exists to support EmberText and other writing tools I’m building. I am really, really, really tired of being required to use things like “Suggesting” in Word and Google Docs. Who thought this thing up? Satan? I like Git better but it needed dumbed down in some places and tweaked in others to do what I needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  EmberText: My First Claude Code Experiment
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://dev.to/eristoddle/building-an-electron-app-from-scratch-with-claude-code-5c03"&gt;EmberText&lt;/a&gt; was my first serious relationship with Claude Code. I built an Electron app for writers, rolling my own context and project management system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ha0dotuo887ok6513r6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ha0dotuo887ok6513r6.png" alt="EmberText Project View" width="800" height="642"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Claude Code has quirks. It’s opinionated about project structure. It sometimes goes off on tangents. But when it works, it does pretty well. EmberText is a fully functional app that I am testing, but I also need to refactor it to use some of the services I am building, so it is probably the last of these project that will be finished.&lt;/p&gt;

&lt;h3&gt;
  
  
  MDQuery: Fed Up with Proprietary Tools
&lt;/h3&gt;

&lt;p&gt;I got tired of proprietary MCPs and tools to search systems that essentially consist of markdown files: Obsidian, Joplin, Jekyll, Log Seq, and on and on. So I built a universal tool with Kiro and Qoder.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/eristoddle/mdquery" rel="noopener noreferrer"&gt;MDQuery&lt;/a&gt; provides SQL-like syntax for searching and analyzing markdown files across different note-taking systems and static site generators. Because fuck trying custom MCPs that don’t work the way you want for each and every markdown based platform.&lt;/p&gt;

&lt;p&gt;And I already had a job lined up for the tool. I had been collecting everything interesting around vibe coding and spec-driven development in my Obsidian vault under a specific tag. Developers were going in so many directions that I wanted to categorize and get an overview. We’re all blind developers describing different parts of this elephant. Maybe if we categorize what we’re all doing, I can be a little less blind. I recently found this post on &lt;a href="https://shmck.substack.com/p/claude-code-framework-wars" rel="noopener noreferrer"&gt;Claude Code framework wars&lt;/a&gt; that does just that.&lt;/p&gt;

&lt;p&gt;So I attached the MCP to Claude desktop and prompted it to analyze all of my Obsidian notes, using the &lt;a href="https://github.com/eristoddle/mdquery/blob/main/docs/claude-desktop-prompts.md" rel="noopener noreferrer"&gt;prompts the AI tools had put in the documentation&lt;/a&gt;. And it spit out an &lt;a href="///downloads/LlmCodingNotes.pdf"&gt;80 page document&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  AutoVibe: The Meta Project
&lt;/h3&gt;

&lt;p&gt;AutoVibe is the most recursive project I’ve ever built. I’m using Claude Code, AI Studio, and Backlog.md to build a tool that will make the vibe coding process smoother.&lt;/p&gt;

&lt;p&gt;It’s infrastructure for building infrastructure. Custom commands and agents to coordinate AI tools. The future of development might be less about writing code and more about conducting orchestras of AI agents. At least, that’s what I think it is. But it’s like a box of chocolates.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.stephanmiller.com%2Fimages%2F2025%2Fautovibe-dashboard.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.stephanmiller.com%2Fimages%2F2025%2Fautovibe-dashboard.png" alt="AutoVibe Dashboard" width="800" height="982"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  AutoVibe Template: The Stop-Gap Solution
&lt;/h3&gt;

&lt;p&gt;While AutoVibe has bugs to work out, I needed something that worked now. So I built a &lt;a href="https://github.com/eristoddle/autovibe-backlog-md-template" rel="noopener noreferrer"&gt;template&lt;/a&gt; with a set of Claude commands and agents that, when used with Backlog.md and Claude Code, makes developing projects more bullet-resistant.&lt;/p&gt;

&lt;p&gt;I used Claude Desktop to help develop it since the template is full of AI instruction files that other tools would try to execute. But then I sort of got stuck testing its usage in the next project. So now the priority is getting AutoVibe to work and keep the simple template.&lt;/p&gt;

&lt;h3&gt;
  
  
  ShopBoth: Testing the Template (and Learning Hard Lessons)
&lt;/h3&gt;

&lt;p&gt;I used Claude Code with my Backlog.md template to build ShopBoth, a React Native app for testing and tweaking the template. Why did I choose React Native for testing? Because I’m an idiot.&lt;/p&gt;

&lt;p&gt;I also discovered that “generic” automated QA doesn’t exist. QA needs to be specific to the platform, the framework, the use case. There ain’t no such thing as universal testing, and I learned that the hard way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Whose Name Shall Not Be Spoken
&lt;/h2&gt;

&lt;p&gt;I’ve built three more apps with an AI tool I can tell you about in a couple of weeks. They add to the ecosystem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BookForge&lt;/strong&gt; transforms markdown files into professional ebooks. It supports EmberText and GitWrite, completing the writing workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PromptOS&lt;/strong&gt; stores and provides prompts for EmberText, AutoVibe, and everything else. Because managing prompts across 15 projects gets complicated fast.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Site Factory&lt;/strong&gt; brings me full circle to that abandoned Gatsby project, but with actual research this time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren’t random projects. They’re pieces of a larger system for AI-assisted content creation and development.&lt;/p&gt;

&lt;p&gt;Update: I actually built one more project while I was writing this. It’s a platform to provide dynamic services for static sites.&lt;/p&gt;

&lt;h2&gt;
  
  
  How These Projects Work Together
&lt;/h2&gt;

&lt;p&gt;This isn’t just a collection of random tools. There’s method to the madness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EmberText handles writing → GitWrite manages collaboration → BookForge generates ebooks&lt;/li&gt;
&lt;li&gt;PromptOS supports everything with prompt management&lt;/li&gt;
&lt;li&gt;Obsidian plugins feed the writing process with research and notes&lt;/li&gt;
&lt;li&gt;MDQuery searches across all the markdown files these tools create&lt;/li&gt;
&lt;li&gt;AutoVibe coordinates the development of new tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s an actual ecosystem, not just 15 disconnected projects. Each piece makes the others more useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned About AI-Assisted Development
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Each AI Tool Has Its Own Personality
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Desktop:&lt;/strong&gt; Great for planning and architecture, terrible for actually writing code. It’s the project manager that never writes any code. Mainly because chat lengths are limited. As soon as you get somewhere, you have to start a new chat and lose the context. I do use it to develop git templates for AI-driven coding projects, because it will ignore the instruction files I have it tweak.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code:&lt;/strong&gt; I am not sure if it is still the best after using Qoder and the new tool I have. I still use it almost daily to work on projects and it’s my goto tool, but the limit comes up quicker now and it seems like it has dropped some IQ points. Not sure about it status in my workflow right now.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kiro:&lt;/strong&gt; The reliable workhorse. Give it a clear task and it delivers solid, working code. Perfect for plugins and smaller projects. After seeing how the new tool I found works, I wonder if it breaks things down into too many tasks. To create an Obsidian plugin, it broke it down into 16 tasks I had to click to get through and it took a few hours. The new tool just decided it would build a plugin for me in 15 minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Jules:&lt;/strong&gt; I still use it for small things, like fixing bugs, because I still get 15 free chats a day. I actually built most of GitWrite with it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qoder:&lt;/strong&gt; Set it loose on a complex project and come back in a few hours to find it’s built everything you asked for and more. The wikis it build are useful but I think would eat up a lot of tokens in large codebases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unnamed Tool:&lt;/strong&gt; The new experiment. Still figuring out its personality. But it’s fast. It finished the two Obsidian plugins in less than an hour, both of them. It built the other full projects in one day, all of them.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Process That Emerged
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Start with an actual need, not cool technology. I tried building technology first and it never worked.&lt;/li&gt;
&lt;li&gt;Let AI handle the boilerplate and focus on the interesting problems. AI is great at CRUD operations and terrible at creative problem-solving.&lt;/li&gt;
&lt;li&gt;Build infrastructure projects to support main projects. GitWrite exists so EmberText can focus on writing, not version control. Also, hoping less scope mean less context an AI tool needs&lt;/li&gt;
&lt;li&gt;Test new AI tools with real projects, not demos. You learn more building something you’ll actually use.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why 15 Projects Makes Sense (Sort Of)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Each project taught me something new about working with different AI tools.&lt;/li&gt;
&lt;li&gt;Building an ecosystem requires multiple pieces. You can’t do everything with one app.&lt;/li&gt;
&lt;li&gt;Some projects exist only to support other projects. That’s fine.&lt;/li&gt;
&lt;li&gt;Abandoning projects is part of the process. That abandoned Gatsby site taught me what not to build.&lt;/li&gt;
&lt;li&gt;The commit chart tells the story. Bursts of activity when testing new tools. Long gaps when focusing on one complex project.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What’s Next: The AI Development Army
&lt;/h2&gt;

&lt;p&gt;AutoVibe is getting closer to making this process smoother. The template approach gives consistent results. The unnamed AI tool projects will add new capabilities.&lt;/p&gt;

&lt;p&gt;I’m building a sustainable vibe coding workflow that lets me maintain 15+ projects without losing my mind. The future of development might be more like conducting an orchestra than playing a solo instrument.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Embrace the Chaos
&lt;/h2&gt;

&lt;p&gt;I started wanting to try Claude Desktop with MCPs. I ended up with 15 projects, a completely new development process, and insights into how AI-assisted development actually works.&lt;/p&gt;

&lt;p&gt;The learning process matters more than perfect planning. AI tools are getting good enough to enable this kind of scattered productivity. You can afford to be curious, to follow tangents, to build infrastructure for projects that don’t exist yet.&lt;/p&gt;

&lt;p&gt;Vibe coding and spec-driven development isn’t for everyone. But if you’re comfortable with chaos (Yes, even in spec-driven development), if you like building things just to see what happens, if you want to push the boundaries of what one person can build with AI assistance, give it a try.&lt;/p&gt;

&lt;p&gt;Just don’t blame me when you end up with 15 projects and a commit chart that looks like madness. That’s the price you pay. And while I was finishing this up, I started another project. I had to give my new tool something to work on before the free ride runs out.&lt;/p&gt;

</description>
      <category>vibecoding</category>
    </item>
    <item>
      <title>How I Built Two Obsidian Plugins While Kiro AI Did Most of the Work</title>
      <dc:creator>Stephan Miller</dc:creator>
      <pubDate>Fri, 22 Aug 2025 07:00:00 +0000</pubDate>
      <link>https://forem.com/eristoddle/how-i-built-two-obsidian-plugins-while-kiro-ai-did-most-of-the-work-40e4</link>
      <guid>https://forem.com/eristoddle/how-i-built-two-obsidian-plugins-while-kiro-ai-did-most-of-the-work-40e4</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyx236i76wouhdk8d7g3o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyx236i76wouhdk8d7g3o.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For &lt;a href="https://dev.to/eristoddle/creating-an-obsidian-plugin-with-claude-ai-gaj"&gt;the first Obsidian plugin I wrote with AI&lt;/a&gt;, I used Claude desktop and already had a Python script that did most of what I needed the plugin to do. I just needed AI to convert it to an Obsidian plugin. Then &lt;a href="https://dev.to/eristoddle/jules-ai-the-currently-free-coding-assistant-that-cant-follow-directions-but-gets-shit-done-33k3"&gt;I used Jules to fix the final bugs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But I have been moving away from the chaos of vibe coding so I can get results that look more like the vision I have in my head. This started when &lt;a href="https://dev.to/eristoddle/building-an-electron-app-from-scratch-with-claude-code-5c03"&gt;I rolled my project management system with Claude Code and a bunch of markdown files&lt;/a&gt;. Then I found out that &lt;a href="https://dev.to/eristoddle/i-tried-to-upgrade-my-blog-with-ai-project-management-and-everything-went-to-hell-but-the-process-2e43-temp-slug-8610687"&gt;Backlog.md could make my vibe coding projects go more smoothly&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But along the way, Kiro came out and had a spec-driven development feature. So I figured I’d try it building even more Obsidian plugins. Why two, though? Well, the first one was more complex, and the more complex a project, the more things AI leaves behind that you have to clean up. The second one was actually simple enough that I am currently prepping it to release it officially.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;What is Kiro and Why I’m Using It&lt;/li&gt;
&lt;li&gt;
Plugin #1: Daily Note Prompts - The Spec-Driven Experience

&lt;ul&gt;
&lt;li&gt;Requirement 1&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;My Obsidian Development Process&lt;/li&gt;

&lt;li&gt;Plugin #2: Joplin Portal - When AI Tools Hit a Wall&lt;/li&gt;

&lt;li&gt;

Lessons Learned

&lt;ul&gt;
&lt;li&gt;Spec-driven development workflow&lt;/li&gt;
&lt;li&gt;When to trust AI vs when to investigate yourself&lt;/li&gt;
&lt;li&gt;Managing AI tool limitations and daily limits&lt;/li&gt;
&lt;li&gt;The value of simple projects for learning new tools&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Conclusion&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is Kiro and Why I’m Using It
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://kiro.dev/blog/introducing-kiro/" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt; is an AI coding tool that takes a different approach than most of the tools I’ve been testing. Instead of just throwing code at you, it starts with what they call “spec-driven development.” You give Kiro an idea, and it creates three files: requirements.md, design.md, and tasks.md.&lt;/p&gt;

&lt;p&gt;Kiro breaks down your project into digestible chunks before writing a single line of code. No more “let’s see what happens” coding sessions that end with you staring at a pile of TypeScript wondering how you got there.&lt;/p&gt;

&lt;p&gt;But here’s the real reason I’m going all-in on Kiro right now: it’s completely free. No credits, no tokens, but there is a daily limit that cuts you off for 24 hours, but I can deal with that.&lt;/p&gt;

&lt;p&gt;So I’m testing Kiro on every coding project I can think of before the free tier disappears. Obsidian plugins are perfect for this because I use Obsidian daily, I have a constant stream of plugin ideas, and I can see the results of mine and Kiro’s efforts quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Plugin #1: Daily Note Prompts - The Spec-Driven Experience
&lt;/h2&gt;

&lt;p&gt;I used to write daily. I had a morning practice where I woke up, meditated, read, and then wrote. But who has all that time? Actually, I’m kicking myself because I had a three-year streak, and it only took one missed day for me to give that up. The “streak” gurus never tell you about that part.&lt;/p&gt;

&lt;p&gt;And for a while, I was doing it in Obsidian using Daily Notes. Now I wanted a plugin to nag me about it and give me a prompt to start with. So that was the start of the idea. Here’s how I fleshed it out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt packs in JSON format can be imported and exported with these attributes: - Type: ‘Sequential’, ‘Random’, ‘Date’ - Prompt: Link, String, or Markdown - Date: For Date type like devotionals - Order: For Sequential type that have to be done in order&lt;/li&gt;
&lt;li&gt;Set Reminder/Alert with System or Obsidian notification.&lt;/li&gt;
&lt;li&gt;Launches daily note with prompt&lt;/li&gt;
&lt;li&gt;Automatically go into zen mode&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And that is basically what I gave Kiro in the first prompt. You can actually still “vibe code” with Kiro. You just have to select that option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpnnl8k0yl9i6z5i0y9bs.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpnnl8k0yl9i6z5i0y9bs.jpg" alt="Kiro Chat Drawer" width="800" height="1768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And Kiro built the three spec files in order, waiting after each file for my approval or for me to request changes.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/eristoddle/obsidian-daily-note-prompts/blob/main/.kiro/specs/obsidian-daily-prompts/requirements.md" rel="noopener noreferrer"&gt;requirements file&lt;/a&gt; has entries that look like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;h3&gt;
  
  
  Requirement 1
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;User Story:&lt;/strong&gt; As an Obsidian user, I want to create and manage prompt packs with different delivery modes, so that I can organize my writing prompts according to my preferred workflow.&lt;/p&gt;
&lt;h4&gt;
  
  
  Acceptance Criteria
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;WHEN a user creates a new prompt pack THEN the system SHALL allow them to specify the type as ‘Sequential’, ‘Random’, or ‘Date’&lt;/li&gt;
&lt;li&gt;WHEN a user adds prompts to a pack THEN the system SHALL support prompts as links, strings, or markdown content&lt;/li&gt;
&lt;li&gt;WHEN a user creates a Sequential prompt pack THEN the system SHALL allow them to define the order of prompts&lt;/li&gt;
&lt;li&gt;WHEN a user creates a Date-based prompt pack THEN the system SHALL allow them to assign specific dates to prompts&lt;/li&gt;
&lt;li&gt;WHEN a user creates a Random prompt pack THEN the system SHALL randomly select prompts without repetition until all are used&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;The &lt;a href="https://github.com/eristoddle/obsidian-daily-note-prompts/blob/main/.kiro/specs/obsidian-daily-prompts/design.md" rel="noopener noreferrer"&gt;design file&lt;/a&gt; specifies things like data models, service interfaces, and other architectural details.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg999gf5ziy4pcec1y3uu.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg999gf5ziy4pcec1y3uu.jpg" alt="Kiro Design File" width="800" height="735"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The final file it creates is the &lt;a href="https://github.com/eristoddle/obsidian-daily-note-prompts/blob/main/.kiro/specs/obsidian-daily-prompts/tasks.md" rel="noopener noreferrer"&gt;tasks file&lt;/a&gt; which look like a basic list of nested to dos in markdown:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjrzxo8tzoax8g4rixt1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjrzxo8tzoax8g4rixt1.jpg" alt="Kiro Tasks File" width="800" height="634"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But when it is loaded in the Kiro IDE, it adds a &lt;strong&gt;Start task&lt;/strong&gt; link you can click on to have Kiro to start working on it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F061vtr6rxkakih7891fp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F061vtr6rxkakih7891fp.jpg" alt="Kiro Implementation Plan" width="800" height="1125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I am not sure whether adding a command to the trust list was buggy, or I just didn’t know how to use it. It would seem that you could just click the play triangle once and the double play triangle to trust the command in the future, but that didn’t seem to work all the time. Then I realized that the list of commands below the paragraph were also buttons and I had more success with those, but not every time. And like I said, it could be user error, but they didn’t make it easy to figure out. And who want’s to read documentation, anyway? No one’s got time for that.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Falwqpybcy21ps6a601sv.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Falwqpybcy21ps6a601sv.jpg" alt="Trusting Kiro Commands" width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the end, I clicked &lt;strong&gt;Start task&lt;/strong&gt; over and over for about four hours while doing other things, finished all the tasks, and decided it was too late to even attempt testing it that night. I know how that goes. So, I tested the next day.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fow12n4pj9ims4uwqae9h.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fow12n4pj9ims4uwqae9h.jpg" alt="Obsidian Daily Prompt Settings" width="800" height="944"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And it actually went pretty well. There were two or three minor bugs. For example, the time field for notification needed debouncing. It was basically working after about a half an hour of bug fixing. But there is a lot of weirdness. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It rolled its own Zen mode, and it does not work quite right. It removed the sidebars and other things. But I couldn’t figure out a way back to normal mode. I just reloaded Obsidian.&lt;/li&gt;
&lt;li&gt;It has global settings that should be overridden by the settings of each prompt pack, but nothing happens to a prompt in a prompt pack if I don’t set the child settings. So, I think most of the global settings do nothing.&lt;/li&gt;
&lt;li&gt;It doesn’t track progress.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiymdoxyszl5axbcybwf2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiymdoxyszl5axbcybwf2.jpg" alt="Obsidian Daily Prompt Edit Prompts" width="800" height="1688"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I am testing it currently and making notes on the changes I need, and once I think I have them all, I’ll start working on it again. I am actually using it, but it is not finished.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here’s the repo:&lt;/strong&gt; &lt;a href="https://github.com/eristoddle/obsidian-daily-note-prompts" rel="noopener noreferrer"&gt;Obsidian Daily Note Prompts&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The other Obsidian plugin idea was much simpler, so I could finish that one.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Obsidian Development Process
&lt;/h2&gt;

&lt;p&gt;After building my fourth Obsidian plugin (3 using AI), I have a process that works really well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I will use Kiro for now while it’s free, because it’s free and I could get results with both plugins before I hit the daily limit.&lt;/li&gt;
&lt;li&gt;I have a Test vault and keep all my plugin repos in the &lt;code&gt;.obsidian/plugins&lt;/code&gt; folder of the vault.&lt;/li&gt;
&lt;li&gt;I test the plugin in that vault and keep notes on the improvements and changes I want to make in a note in that vault rather than sending one-off prompts to Kiro.&lt;/li&gt;
&lt;li&gt;When I think I have found enough for another spec in Kiro, I just paste what I’ve been collecting into another Kiro spec chat. A chat interface makes me anxious, to tell the truth, because it requires interaction. This allows me to deal with responding on my time. It also means I can plan changes even when I’ve been cut off.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8z5hs30agzvolyiiq1gd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8z5hs30agzvolyiiq1gd.jpg" alt="Kiro Features Prompt" width="800" height="342"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Plugin #2: Joplin Portal - When AI Tools Hit a Wall
&lt;/h2&gt;

&lt;p&gt;With this plugin, I simply wanted to access Joplin from Obsidian and be able to import notes from Joplin directly from the plugin. I know I can copy and paste the note or export notes from Joplin, but a plugin will make things easier.&lt;/p&gt;

&lt;p&gt;These are the notes I gave Kiro:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Name: Joplin Portal&lt;/li&gt;
&lt;li&gt;A way to access my notes in Joplin from Obsidian so I don’t have to junk up my vault&lt;/li&gt;
&lt;li&gt;Use the Joplin Web Clipper API to interact with Joplin: &lt;a href="https://joplinapp.org/help/api/references/rest_api/" rel="noopener noreferrer"&gt;Joplin Data API | Joplin&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Add a sidebar panel that searches Joplin notes either by full text or by tag&lt;/li&gt;
&lt;li&gt;Also import and convert Joplin notes as Obsidian notes (template, default folder, etc) probably using the same search functionality, with a checkbox to check if you want to import that result.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This project went about the same way. Kiro created the requirements, design, and task files, and I clicked &lt;strong&gt;Start task&lt;/strong&gt; over and over for 3-4 hours (while doing other things) until I discovered that there is a limit to Kiro usage and I hit it. And from what I can tell online, it’s a 24-hour break. And there was one task left, so I actually had Jules finish that task.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2vl52sfhg1ktis4ahcc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2vl52sfhg1ktis4ahcc.jpg" alt="Joplin Portal Sidebar" width="800" height="1391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And it worked relatively well. The biggest issue was that the plugin was pulling in the Joplin notes as is, and Joplin references images in the note with a Joplin ID. So, unlike the screenshot above, the images were not showing. So in my next Kiro session, I worked on that.&lt;/p&gt;

&lt;p&gt;In trying to fix this, I relearned a lesson about using AI for development. If it takes the tool more than two tries to fix something, provide the AI tool more information.&lt;/p&gt;

&lt;p&gt;So after about 8 tries and 2 days, I gathered the information it needed myself because depending on how the note was created in Joplin, embedded images showed up in one of three formats. Then I found the function that created the preview and the function that imported the note into Obsidian. I told Kiro about all of this, and once it had that, it fixed the issue in both functions.&lt;/p&gt;

&lt;p&gt;And it was pretty close to done, but I wanted to clean up the UI a little and give the important things a little more space. So I started another spec chat on that, and Kiro created a new set of three files called ui-improvements.&lt;/p&gt;

&lt;p&gt;Then, to prep for submitting to the community, I had it look up the rules for plugin submission and create some specs for that as well. Now that I know I can keep adding new specs as a project grows, I might try using Kiro with something bigger than an Obsidian plugin.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here’s the repo:&lt;/strong&gt; &lt;a href="https://github.com/eristoddle/joplin-portal" rel="noopener noreferrer"&gt;Obsidian Joplin Portal&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7vltbfvx1n0dstilr85.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7vltbfvx1n0dstilr85.jpg" alt="Kiro Specs, Hooks, Steering, and MCP" width="790" height="2484"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Spec-driven development workflow
&lt;/h3&gt;

&lt;p&gt;Turns out there’s something to be said for planning before you code. What a concept! The three-file system is genuinely useful. When I hit a wall or need to explain a bug to Kiro, I can reference the original specs instead of trying to reverse-engineer what sleep-deprived-me was thinking at 2 AM last Tuesday.&lt;/p&gt;

&lt;p&gt;Also, no more scope creep disguised as “quick features.” When everything is specced out upfront, it’s obvious when you’re adding random shit that doesn’t belong. That is if you pay attention instead of accepting everything as if it were only terms and conditions.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to trust AI vs when to investigate yourself
&lt;/h3&gt;

&lt;p&gt;Give any AI tool exactly two chances to fix a complex problem. After that, you’re just feeding prompts to a very expensive random number generator.&lt;/p&gt;

&lt;p&gt;The Joplin image rendering issue taught me this lesson yet again. I spent hours watching different AI tools chase their own tails, generating increasingly elaborate solutions to a problem they didn’t understand. When I all I had to do was stop, investigate what was happening for less than an hour, and give Kiro the details.&lt;/p&gt;

&lt;p&gt;AI tools are great at implementing solutions. They’re terrible at debugging complex integration issues that require understanding context they don’t have access to. It remains to be seen when I will actually remember this earlier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managing AI tool limitations and daily limits
&lt;/h3&gt;

&lt;p&gt;Treat daily limits like sprint deadlines. Plan your work in chunks that can be completed within the limit, and always end each session by documenting exactly where you are. Nothing sucks more than hitting your limit mid-debugging session and forgetting what the hell you were trying to fix.&lt;/p&gt;

&lt;p&gt;Also, don’t game the system by splitting complex tasks into tiny prompts. You’ll just confuse the AI and waste your allocation on back-and-forth clarifications.&lt;/p&gt;

&lt;h3&gt;
  
  
  The value of simple projects for learning new tools
&lt;/h3&gt;

&lt;p&gt;Complex projects are terrible for learning new AI tools. You spend too much time fighting edge cases and not enough time understanding the tool’s strengths and weaknesses. The Joplin Portal plugin was simple enough that I could focus on Kiro’s workflow instead of drowning in business logic. The Daily Note Prompts plugin had a more going on.&lt;/p&gt;

&lt;p&gt;Start simple. Learn the tool. Then tackle the hard stuff when you’re not also learning how to communicate with an AI that may or may not understand what a TypeScript interface is. Simple projects also fail faster and more obviously, which means you spend less time wondering if the AI is broken or if your requirements were just garbage to begin with.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Kiro’s free tier is basically a coding assistant on steroids with no monthly subscription guilt. While it lasts, I’m going to get my money’s worth, even though it’s not costing me any money.&lt;/p&gt;

&lt;p&gt;The spec-driven approach works well. Having an AI that plans before it codes instead of just eyeballing everything and hoping for the best is a game changer. But… I was already building my own tool to do this before I heard of Kiro, so I know it’s not “magic.” And I also suspect that it consists mainly of system prompts.&lt;/p&gt;

&lt;p&gt;But while it’s free… So if you see a bunch of random Obsidian plugins with my name on them over the next few months, you’ll know why. I’m not building them because the world desperately needs another note-taking plugin. I’m building them because there’s a free AI coding assistant that won’t be free forever, and I plan to learn everything I can while the learning is cheap.&lt;/p&gt;

&lt;p&gt;And while I get side-tracked and my own AI software is not ready yet, my next post should be about using AI to build that software. The process I have works really well, and I am building a public GitHub template repository to recreate it.&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>obsidian</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Jules AI - The (Currently) Free Coding Assistant That Can't Follow Directions But Gets Shit Done</title>
      <dc:creator>Stephan Miller</dc:creator>
      <pubDate>Tue, 10 Jun 2025 13:00:00 +0000</pubDate>
      <link>https://forem.com/eristoddle/jules-ai-the-currently-free-coding-assistant-that-cant-follow-directions-but-gets-shit-done-33k3</link>
      <guid>https://forem.com/eristoddle/jules-ai-the-currently-free-coding-assistant-that-cant-follow-directions-but-gets-shit-done-33k3</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4hh44z0chvfa8bmzqnc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4hh44z0chvfa8bmzqnc.png" alt=" " width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Jules is currently free, so I figured it was worth a try. Of course, I had to find out what I could break.&lt;/p&gt;

&lt;p&gt;Each user gets these default limits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;5 concurrent tasks&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;60 total tasks per day&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 codecasts per day&lt;/strong&gt; (still not sure what this means)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sixty tasks a day for free? That’s really generous and it meant I didn’t have to worry about reading documentation or instructions the first time.&lt;/p&gt;

&lt;h2&gt;
  
  
  I Jumped in Feet First
&lt;/h2&gt;

&lt;p&gt;My Obsidian plugin worked well enough for me to use it, but there were issues. Issues that I’d been ignoring because the damn thing worked and I had other shit to do. But I finally added issues to my GitHub repo.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fevline3hrpysdmps6uen.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fevline3hrpysdmps6uen.png" alt="obsidian-plugin-github-issues" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Four issues. Nothing too crazy. Should be straightforward for an AI that can supposedly code.&lt;/p&gt;

&lt;p&gt;I gave Jules access to the repo and told it: “I want you to work on the issues in this repository and create a feature branch for each fix. Start with the first one.”&lt;/p&gt;

&lt;p&gt;Simple enough, right?&lt;/p&gt;

&lt;h2&gt;
  
  
  Jules Decides to Ignore My Instructions
&lt;/h2&gt;

&lt;p&gt;Jules fetched the repo, cloned it into a VM, and read the README file. But then it discovered the “Future Enhancements” section I’d added to the README and decided that’s what it wanted to work on instead.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0fu7sjoprmthq42z967.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0fu7sjoprmthq42z967.png" alt="obsidian-plugin-future-enhancements" width="800" height="543"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It completely ignored my instruction to work on the issues, the same issues that are listed right above the “Future Enhancements” section, and decided it would work on the “Interactive book selection dialog” instead.&lt;/p&gt;

&lt;p&gt;Jules then asked me to approve the plan, but I didn’t see any way to reject it. It was going to auto-approve in about a minute anyway. I thought maybe clicking “Approve” would launch a modal where I could say “no, you illiterate AI, work on the issues I told you to work on.”&lt;/p&gt;

&lt;p&gt;It did not.&lt;/p&gt;

&lt;p&gt;So it started working on the wrong thing. I messaged “stop” and it replied: “I have stopped the current task. Is there something specific you would like me to do instead, or a different issue you’d like me to focus on from the README?”&lt;/p&gt;

&lt;p&gt;It paused for a few seconds and then continued doing exactly what it was doing before.&lt;/p&gt;

&lt;p&gt;Currently, as far as I can tell, the only way to reject a plan is to let it start, pause it manually, and then delete the entire task. User experience design at its finest.&lt;/p&gt;

&lt;p&gt;Jules finished the feature and gave me the option to click a button to create a feature branch. So now I had a new feature to test before I could fix the actual issues I wanted fixed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqnup91vj1pbim7q6hqw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqnup91vj1pbim7q6hqw.png" alt="jules-obsidian-plugin-branch" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing the Feature that Jules Built While Ignoring My Prompt
&lt;/h2&gt;

&lt;p&gt;So I pulled the branch down to the Obsidian vault I use for testing, the one with all the plugins I’m developing loaded up. There was one error in the code, in unit tests I never asked Jules to add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tests/main.test.ts:160:35 - error TS2554: Expected 0 arguments, but got 1.
sortAnnotationsByCFI: jest.fn((ann: Annotation[]) =&amp;gt; ann), // Simple pass-through

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I just deleted the broken test.&lt;/p&gt;

&lt;p&gt;It took me a while to figure out that the selection modal didn’t launch via the sidebar button like I expected. I had to hit Command+P and find the command to launch it. But you know what? It actually worked.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F91ifg2dalj4t6p4zm0og.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F91ifg2dalj4t6p4zm0og.png" alt="ebook-selection-modal" width="800" height="842"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since the feature worked, I merged the change and created a new release. Then I did some reading to figure out how to prevent Jules from going off on another tangent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Back to Fixing Bugs (This Time With Feeling)
&lt;/h2&gt;

&lt;p&gt;I thought maybe I should be more descriptive, but there wasn’t much to be descriptive about. There are issues. I added them to the repo. They’re also in the README that I know Jules can access. Most of the &lt;a href="https://github.com/google-labs-code/jules-awesome-list" rel="noopener noreferrer"&gt;example prompts&lt;/a&gt; are very short, shorter than what I’d already tried.&lt;/p&gt;

&lt;p&gt;So I tried again: “This github repo has 4 issues listed here: &lt;a href="https://github.com/eristoddle/apple-books-annotation-import/issues" rel="noopener noreferrer"&gt;https://github.com/eristoddle/apple-books-annotation-import/issues&lt;/a&gt;. Give me a plan for fixing the first issue. If you have no access to these, let me know. Do not do anything else.”&lt;/p&gt;

&lt;p&gt;This time it worked.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhpci5s2f4utp2mt4abhn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhpci5s2f4utp2mt4abhn.png" alt="jule-web-interface" width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ykxjvnrqe99ae1jiq56.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ykxjvnrqe99ae1jiq56.png" alt="jules-cover-fix-plan" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both the accidental feature and the bug fix took very little time. I was actually surprised by the speed. But Jules created broken unit tests again. I figured maybe it was because I didn’t have my npm and node versions in the package.json, so I added those and deleted them again.&lt;/p&gt;

&lt;p&gt;It actually fixed the bug and at least doubled the number of book covers I was getting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Great Color Mapping Mystery
&lt;/h2&gt;

&lt;p&gt;The second issue was about finding annotation colors.&lt;/p&gt;

&lt;p&gt;Me: “Fix this feature in the repo: &lt;a href="https://github.com/eristoddle/apple-books-annotation-import/issues/5" rel="noopener noreferrer"&gt;https://github.com/eristoddle/apple-books-annotation-import/issues/5&lt;/a&gt;. Do not do anything else.”&lt;/p&gt;

&lt;p&gt;I think “Do not do anything else” will be part of my prompt from now on.&lt;/p&gt;

&lt;p&gt;Jules said: “The code changes have been reviewed for syntactical correctness and logical soundness. Direct testing with an Apple Books database is not possible, so final validation will be up to the user.”&lt;/p&gt;

&lt;p&gt;Translation: “I changed some code and it compiles. Good luck!”&lt;/p&gt;

&lt;p&gt;After testing, I told Jules: “Now it seems like there are no annotation color indicators at all in the md files. Before they were all just purple. Now they are just not there at all.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4az4upl978wms13qsati.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4az4upl978wms13qsati.png" alt="jules-annotation-color-message-two" width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The second attempt produced different colors, but they didn’t match the highlight colors in the Books app.&lt;/p&gt;

&lt;p&gt;I mapped the colors Jules gave me to the actual colors and told it about the differences: “The color images are back and there are a variety of them, but they do not match the colors in the Books app. Here is how they differ (Books app color → markdown color): underline → both yellow and underline, purple → no icon at all, pink → red, yellow → purple, blue → blue, green → green.”&lt;/p&gt;

&lt;p&gt;I even included the log showing the distinct annotation style values from my test database. So this bug took three interactions, but Jules finally got the color mapping right.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqfjye3fwyv6n12bxlj7o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqfjye3fwyv6n12bxlj7o.png" alt="obsidian-annotation-colors" width="800" height="1381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Taking It to the Big Leagues
&lt;/h2&gt;

&lt;p&gt;My main Apple Books account has 4000 books and over 60 books with highlights. When I first worked on this plugin, the database structure was different, so I wanted to make sure everything worked in the wild before calling it done.&lt;/p&gt;

&lt;p&gt;The plugin handled the larger database just fine. All the covers loaded correctly, and the color annotations matched what I saw in the Books app.&lt;/p&gt;

&lt;h2&gt;
  
  
  Everything I Fixed In This Change
&lt;/h2&gt;

&lt;p&gt;It took about two hours total, not counting the issue Jules couldn’t fix. I let it code while I took notes for this article, worked on Obsidian cleanup, and checked out Reddit for a while.&lt;/p&gt;

&lt;p&gt;Here’s what got implemented:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;✅ New interactive ebook selection dialog:&lt;/strong&gt; By accident when I was asking Jules to fix issues, but I’m counting it as a win.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;✅ Fixed finding ebook covers:&lt;/strong&gt; This was the original first issue. Jules doubled the number of covers the plugin could find.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;✅ Fixed annotation color icons:&lt;/strong&gt; Took three tries and some detailed feedback, but now the colors match what’s in the Books app.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;✅ Fixed missing annotation dates:&lt;/strong&gt; This needed a second attempt. I had to tell Jules: “I have include dates set to true and include citations set to false. There are no dates in the markdown files though.” But it got it right the second time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;❌ Did not fix gibberish in epub chapter names:&lt;/strong&gt; This one turned into a nightmare. The TypeScript build error migrated from the tests that never worked. I tried to get Jules to fix it once and it failed. The second time it was thinking forever, so I tried Copilot in VS Code, and that fixed the build.&lt;/p&gt;

&lt;p&gt;But then there was a runtime error I’d run into before when Claude Code tried to use a Node.js ebook parsing package. Something about not being able to access the file system in Electron. I went round and round with Jules on this one and finally gave up.&lt;/p&gt;

&lt;p&gt;One thing I realized is that I could tell Jules to build the app to test the build and fix issues, but I had to make it a command. When I asked “Did you build the app to test it?” it said that wasn’t possible. But when I just told it to build the project, it did it.&lt;/p&gt;

&lt;p&gt;So I let it churn through fixing the build errors for 30 minutes. When I tested the result, the chapters still weren’t fixed. But I didn’t expect this one to be easy without doing some research first.&lt;/p&gt;

&lt;p&gt;I used 5 of my available 60 free tasks for the day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Jules: Fast, Confused, but Generally Useful
&lt;/h2&gt;

&lt;p&gt;Jules is fast. That surprised me. At least until there’s back and forth on changes, then the conversation slows down.&lt;/p&gt;

&lt;p&gt;It was done with the next fix before I finished testing the previous one. I added one feature I didn’t plan on and fixed four issues in a little over an hour.&lt;/p&gt;

&lt;p&gt;But Jules went off on a weird tangent when I gave it what I thought were specific instructions. There seems to be no real interaction other than the first message and accepting the plan. I was worried about fixing issues in a branch it created if something didn’t work.&lt;/p&gt;

&lt;p&gt;Using a task like it was a chat and trying to fix more than one issue per task got confusing. Then I figured out I needed a different workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 chat per branch, but can do multiple related things&lt;/li&gt;
&lt;li&gt;1 chat can be as big as 1 feature&lt;/li&gt;
&lt;li&gt;Code review each branch&lt;/li&gt;
&lt;li&gt;If there are issues, go back to the chat and have Jules fix them&lt;/li&gt;
&lt;li&gt;If there are build errors, tell Jules specifically to build the app again, test, and fix any errors.&lt;/li&gt;
&lt;li&gt;Pull and test again&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I was used to my Claude Code process where everything happened in the branch I was on locally. The “Publish Branch” button in Jules seemed scary at first, but the second time I used it, it just pushed modifications to the existing branch. I was expecting merge conflicts or some other drama.&lt;/p&gt;

&lt;p&gt;Most things I asked Jules to do worked well, except for the chapter names issue. I also didn’t ask it to create unit tests, but it kept creating broken ones anyway. It each branch it created, I ran them once and if they failed, I deleted them. They mainly failed because of TypeScript type issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Future of Lazy Coding
&lt;/h2&gt;

&lt;p&gt;Sixty tasks a day for free is generous. I’m planning to move everything I have to GitHub that isn’t already there and create issues for any feature ideas or bugs I encounter.&lt;/p&gt;

&lt;p&gt;I could envision a process where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A user puts in a support request&lt;/li&gt;
&lt;li&gt;An issue gets created on the repo&lt;/li&gt;
&lt;li&gt;I tell Jules to fix it&lt;/li&gt;
&lt;li&gt;Jules fixes it, creates a custom build, and the user tests it (don’t know about this, but I can dream)&lt;/li&gt;
&lt;li&gt;If the fix works, I merge it and create a new release&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m not sure exactly how this process would work in practice, but the potential is there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verdict: Worth the Price of Free
&lt;/h2&gt;

&lt;p&gt;Jules isn’t perfect. It ignored my initial instructions, created broken unit tests, and couldn’t fix the most complex issue I threw at it. But it’s fast, it’s free for now, and it actually solved most of the problems I needed solved.&lt;/p&gt;

&lt;p&gt;In two hours, I got a new feature and fixed three out of four issues that had been sitting in my backlog. The fourth issue would have taken me some research into ePub structure even doing it manually.&lt;/p&gt;

&lt;p&gt;I’m definitely planning to shift more of my “vibe coding” projects to Jules. At 60 free tasks per day, I can afford to let it work on the boring stuff while I focus on the interesting problems. In fact, to use 60 tasks a day, I might have to try running concurrent tasks and I can do 5 of those.&lt;/p&gt;

&lt;p&gt;And if it goes off on another tangent and builds something I didn’t ask for? Well, sometimes the best features are the ones you never knew you needed.&lt;/p&gt;

</description>
      <category>llmcoding</category>
      <category>vibecoding</category>
      <category>programming</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Creating an Obsidian Plugin with Claude AI</title>
      <dc:creator>Stephan Miller</dc:creator>
      <pubDate>Mon, 02 Jun 2025 13:00:00 +0000</pubDate>
      <link>https://forem.com/eristoddle/creating-an-obsidian-plugin-with-claude-ai-gaj</link>
      <guid>https://forem.com/eristoddle/creating-an-obsidian-plugin-with-claude-ai-gaj</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3d0eqmeljwetrhkifnmv.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3d0eqmeljwetrhkifnmv.jpg" alt="Image description" width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this post, I took a break from the &lt;a href="https://dev.to/eristoddle/building-an-electron-app-from-scratch-with-claude-code-5c03"&gt;Electron project I am building with Claude Code&lt;/a&gt; to see how fast I could get a smaller project done. Since I am building a relatively simple Obsidian plugin, I went back to using Claude Desktop with the &lt;a href="https://github.com/modelcontextprotocol/servers/tree/main/src/sequentialthinking" rel="noopener noreferrer"&gt;sequential thinking&lt;/a&gt; and &lt;a href="https://github.com/wonderwhy-er/DesktopCommanderMCP" rel="noopener noreferrer"&gt;desktop commander&lt;/a&gt; MCP tools.&lt;/p&gt;

&lt;p&gt;And I learned a few more things about using AI to write code. Read on for the complete story&lt;/p&gt;

&lt;h2&gt;
  
  
  The Story Behind the Project
&lt;/h2&gt;

&lt;p&gt;I’m obsessed with having all my book highlights and notes in one place, and Obsidian seemed like a great place for them. First, I paid for a year’s Readwise subscription, which really didn’t cost too much and did the job well, but after the year, I realized that I really only used it to import my highlights from Kindle and Apple Books and there were free Obsidian plugins to import Kindle highlights.&lt;/p&gt;

&lt;p&gt;Then I discovered that Apple Books stored all these highlights and notes in an open though hard to find and hard to understand SQLite database. So I wrote a &lt;a href="https://dev.to/eristoddle/exporting-mac-osx-book-highlights-into-an-obsidian-vault-or-markdown-files-40lg"&gt;Python script to import Apple Books annotations&lt;/a&gt;, using example from sources like this &lt;a href="https://github.com/davidfor/calibre-annotations/blob/master/readers/_iBooks.py" rel="noopener noreferrer"&gt;Calibre plugin repo&lt;/a&gt; and used the Obsidian Python Scripter plugin to run it from Obsidian. I only had AI help when I ran into issues for the first script and wrote most of it myself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problems with the Original Python Script
&lt;/h2&gt;

&lt;p&gt;The first issue with the Python script I wrote is that the Python Scripter plugin is no longer available. This was not much of a problem though. I just hard-coded the file path of my vault’s book notes folder in the script and it still worked for me.&lt;/p&gt;

&lt;p&gt;Only it wasn’t really working the way I thought it was. There were some issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The annotations were not sorted by the location in the book.&lt;/li&gt;
&lt;li&gt;The annotations weren’t by chapter (there is still an issue in the current plugin in that I have chapters now, but they are not human readable).&lt;/li&gt;
&lt;li&gt;There were no configuration options. I just edited the script when I wanted to change something.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 1: Perfecting the Python Script with Claude
&lt;/h2&gt;

&lt;p&gt;Before jumping right in to creating a plugin, I fixed the Python script’s sorting issue first. My reasoning was that if I got stuck in the plugin development process, I would at least have this script and it would work. And once it did, I could have Claude look at it for a working example. And it turns out this was a good move for these exact reasons.&lt;/p&gt;

&lt;p&gt;I also thought I could create this plugin with only a Claude Project and no MCP support, but I was wrong about not using MCP. It might have been possible, but MCP tools made things easier. Here was my first message:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am using this Python file to import highlights and notes from the Mac iBooks app. I don’t think I am rendering them in order from the front of the book to the back in the markdown file. Let me know if I am and if not how to fix it&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Sorting Challenge
&lt;/h3&gt;

&lt;p&gt;Claude initially gave me three sorting options, but none actually sorted by book location. To test the sort, I put highlights in a book with notes that told their order by location and by when I entered them. So I sent this message, along with a dump of the SQLite table that had the annotations:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The first method didn’t seem to work. The second two seemed to sort by date entered or modified instead of by location in the book. Here is the dump of that database table.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The table dump was key, because Claude analyzed the CFI strings that I’d been struggling with and created a sophisticated parser that could extract positional information from cryptic strings like &lt;code&gt;epubcfi(/6/24[c3.xhtml]!/4/188/2/1,:0,:1)&lt;/code&gt;. And I was done with that until the end, when I had to revisit it multiple times in the plugin version.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding Chapter Detection
&lt;/h3&gt;

&lt;p&gt;With sorting fixed, I pushed further:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Now that we have that fixed, is there a way to put chapter headings where they belong in the resulting markdown?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because I was using a small set of 5 books with highlights, this seemed to work well. Initially, chapters in some books came out like “c3.xhtml” and various other things. But it magically fixed that.&lt;/p&gt;

&lt;p&gt;But in the end, it was not magic. And just to let you know, I was using Claude 4, and Claude 4 will use workarounds and not tell you about it. When I tested the script on an account that had 60 books with highlights, none of the chapters in the results markdown were human readable.&lt;/p&gt;

&lt;p&gt;I looked into what Claude wrote and it was basically a hack. It was a series of ifs customized to that first set of 5 books, so only they came out right. These are still not human readable in the plugin. I figure I have to use the SQLite results in unison with the ePub file data to make them so, but that’s what new features are for.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo728yja1b1tgi26dhgar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo728yja1b1tgi26dhgar.png" alt="new-obsidian-annotations-md-template" width="800" height="1327"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Enhanced Features
&lt;/h3&gt;

&lt;p&gt;I wanted to make sure I added everything I could to the Python script I could before moving onto the plugin and added:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Citation-ready format&lt;/li&gt;
&lt;li&gt;More metadata extraction&lt;/li&gt;
&lt;li&gt;Better error handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can find the code for that &lt;a href="https://gist.github.com/eristoddle/5a8e7dd0597d09d00aa5de066788c303" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 2: Converting to an Obsidian Plugin
&lt;/h2&gt;

&lt;p&gt;It took me about an hour to update the Python script. I figured I was on a roll. I had working code, and all Claude had to do was convert it to JavaScript and make it an Obsidian plugin. Famous last words. There is always something and many times it is something stupid.&lt;/p&gt;

&lt;p&gt;I had updated the Python script in a Claude chat inside of a Claude project, but had yet added any Project Resources. Now that I was going to work on the plugin, I uploaded these there:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The final Python script:&lt;/strong&gt; So it could reference this instead of reinventing the wheel.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The blog post I wrote about the original Python version:&lt;/strong&gt; I figured it would explain what it was doing and what I was trying to do.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Obsidian sample plugin repository:&lt;/strong&gt; I am not sure I needed this. Maybe at the beginning, but I eventually realized (duh), that project resources become part of the context, which means your chats get cut off quicker, so I removed it. It never had problems with the plugin API.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And I sent this message:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I have attached a Python script to import apple books annotations. I have also attached the Obsidian plugin sample for reference. I have also attached an article on how an older version of the Python script integrates with Obsidian. Help me create an Obsidian plugin that does the same thing, step by step.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Development Process
&lt;/h3&gt;

&lt;p&gt;Claude recognized I had sequential thinking MCP and used it right away and came up with a plan.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmp01dvp6aoft0hfj1nbn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmp01dvp6aoft0hfj1nbn.png" alt="claude-generate-obsidian-plugin" width="800" height="888"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But I forgot to tell Claude where the project was located, so after that, it spit out all the files in the chat. I quickly corrected that issue by telling it to create all the files in my project itself.&lt;/p&gt;

&lt;p&gt;The initial conversion went surprisingly smoothly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Project structure creation&lt;/strong&gt; - all necessary TypeScript files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database access logic&lt;/strong&gt; - converting Python SQLite to JavaScript&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Markdown generation&lt;/strong&gt; - translating the formatting logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Settings interface&lt;/strong&gt; - creating a proper Obsidian settings panel, which I was even worried about at this point.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftrfsvm2cqtxensrldr32.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftrfsvm2cqtxensrldr32.png" alt="obsidian-annotation-import-settings" width="800" height="957"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical Challenges and Solutions
&lt;/h3&gt;

&lt;p&gt;Well, it was functional. But let’s just say I was not even halfway done yet. Also, and I am so tired of this, Claude corrected my name everywhere from “Stephan” to “Stephen”.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pc1pk1h1lep81hcu2qz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pc1pk1h1lep81hcu2qz.png" alt="claude-getting-my-name-wrong" width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just like Google always does.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2jxsexvhopwy4l1euxbh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2jxsexvhopwy4l1euxbh.png" alt="fuck-you-google" width="800" height="532"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No, Google, my name is “Stephan.” But back on track. Here are the issues I ran into after I had a “working” Obsidian plugin.&lt;/p&gt;

&lt;h4&gt;
  
  
  SQLite Library Issues
&lt;/h4&gt;

&lt;p&gt;A big hurdle was database access. Claude initially tried &lt;code&gt;better-sqlite3&lt;/code&gt; but ran into Electron compatibility issues. We switched to &lt;code&gt;sql.js&lt;/code&gt;, a pure JavaScript SQLite implementation, but then faced WebAssembly loading problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The solution:&lt;/strong&gt; Configure &lt;code&gt;sql.js&lt;/code&gt; with proper WASM file handling in the Obsidian environment.&lt;/p&gt;

&lt;h4&gt;
  
  
  Query Formatting Problems
&lt;/h4&gt;

&lt;p&gt;Even after fixing the SQLite library, we had issues with SQL query formatting when passed to the command line. Claude described the approach as “bulletproof” right before it broke, which became a running theme.&lt;/p&gt;

&lt;h4&gt;
  
  
  Chapter Parsing Regression
&lt;/h4&gt;

&lt;p&gt;One book showed chapters as “Ahr5106 us trade bbp text 2” instead of readable names. This required Claude to revisit the CFI parsing logic and ensure consistency with the Python version. This is still a challenge and I am going to do research before I try to fix this.&lt;/p&gt;

&lt;h3&gt;
  
  
  The EPUB Metadata Challenge
&lt;/h3&gt;

&lt;p&gt;It was now two hours since I started revising the original Python script. The Python script used &lt;code&gt;ebooklib&lt;/code&gt; to extract book covers and enhanced metadata. Finding a JavaScript equivalent proved challenging. And I was a little gun shy by now, so I asked:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Is there anything in JavaScript that will do the same thing the python script did with &lt;code&gt;ebooklib&lt;/code&gt;, like get the cover image and other metadata? If you know of something, do not commit code until I say it is working. It works now and I don’t want to commit broken code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude suggested three options, and I chose &lt;code&gt;epub2&lt;/code&gt; for its popularity and feature set. However, it took several rounds of debugging across multiple chat sessions to get ePub parsing working correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-World Testing Challenges
&lt;/h3&gt;

&lt;p&gt;Testing on my actual Apple Books account (with 4,711 book) shook some new bugs loose:&lt;/p&gt;

&lt;h4&gt;
  
  
  Database Schema Variations
&lt;/h4&gt;

&lt;p&gt;The plugin tried to query columns that didn’t exist on my primary account, causing SQLite errors. The Python script handled this gracefully, but the plugin needed updates.&lt;/p&gt;

&lt;h4&gt;
  
  
  Memory and Buffer Limits
&lt;/h4&gt;

&lt;p&gt;With thousands of books, we hit “maxBuffer length exceeded” errors. Claude implemented chunked processing to handle large datasets.&lt;/p&gt;

&lt;h4&gt;
  
  
  Annotation Grouping Issues
&lt;/h4&gt;

&lt;p&gt;The plugin kept breaking up annotations that should have been together and duplicating others. This took six rounds of debugging, with Claude repeatedly missing the fact that the Python version worked perfectly.&lt;/p&gt;

&lt;p&gt;I finally resorted to extended thinking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Think as hard as you can about the fact that the Python script is working, and the plugin is not and the only thing happening is SQLite queries and handling the results, so this should be fucking simple.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Development Timeline
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hour 1-2:&lt;/strong&gt; Python script update, initial plugin creation, and basic functionality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hour 3:&lt;/strong&gt; Fighting with ePub metadata extraction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hour 4-6:&lt;/strong&gt; Real-world testing, debugging on large dataset, and configuration tweaking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total development time: About 6 hours across multiple sessions, with Claude handling the bulk of the coding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72papzemlnrrmm8b574x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72papzemlnrrmm8b574x.png" alt="claude-create-obsidian-plugin-results" width="800" height="2064"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned This Time
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Good
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rapid prototyping:&lt;/strong&gt; Claude excels at converting working logic between languages most times, but not understanding that it could use the same SQLite queries in both JavaScript and Python was a pain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Comprehensive solutions:&lt;/strong&gt; Often suggests improvements you haven’t considered&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pattern recognition:&lt;/strong&gt; Great at handling complex parsing tasks like CFI strings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation:&lt;/strong&gt; Automatically generates thorough README files and comments&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Challenging
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context switching:&lt;/strong&gt; Moving between chat sessions sometimes loses important context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overconfidence:&lt;/strong&gt; Claims solutions are “bulletproof” right before they break&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Debugging persistence:&lt;/strong&gt; Sometimes fixates on wrong solutions instead of reverting to working code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Name corrections:&lt;/strong&gt; Consistently “corrected” my name spelling throughout the project&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tips for Successful Claude Desktop Coding
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use projects&lt;/strong&gt; for better context persistence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Be specific&lt;/strong&gt; about what’s working vs. broken&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintain working versions&lt;/strong&gt; before attempting fixes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test incrementally&lt;/strong&gt; rather than making multiple changes at once&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provide concrete examples&lt;/strong&gt; when debugging&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;I am happy with the results. There is still some more work to do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not sure what is happening with the covers. The Python version had much more luck finding the covers, but then it used a more extensive library to do it.&lt;/li&gt;
&lt;li&gt;The chapter names are still in gibberish in general. I think is because it is just using the value in the CFI location string. I am guessing I have to look this up somehow in the ePub file.&lt;/li&gt;
&lt;li&gt;There seems to be a configuration value for adding the annotation date to each entry, but I haven’t seen one. Not sure what is going on there.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But for now I can use it and when I fix it, just have it overwrite all the old files. Then, once I am happy with it, store a hash of the file in properties or something like that and have overwriting work more intelligently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9vh0i4p0shp2n3jhndre.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9vh0i4p0shp2n3jhndre.png" alt="obsidian-book-note-example" width="800" height="1210"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Install the Apple Books Highlight and Note Importer
&lt;/h2&gt;

&lt;p&gt;I do plan on releasing this as a community plugin. I just want to use it a while to see if there are any more bugs I want to fix or features I want to add.&lt;/p&gt;

&lt;p&gt;But for now the simplest way to install and test this plugin is with &lt;a href="https://github.com/TfTHacker/obsidian42-brat" rel="noopener noreferrer"&gt;BRAT (Beta Reviewer’s Auto-update Tool)&lt;/a&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Install BRAT&lt;/strong&gt; from the Community Plugins store in Obsidian&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open BRAT settings&lt;/strong&gt; (Settings → BRAT)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Click “Add Beta plugin”&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enter this repository URL&lt;/strong&gt; : &lt;code&gt;https://github.com/eristoddle/obsidian-apple-books-import&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Click “Add Plugin”&lt;/strong&gt; - Choose the latest version and BRAT will automatically install and enable it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-updates&lt;/strong&gt; : BRAT will automatically update the plugin when new versions are released if you choose the &lt;code&gt;latest&lt;/code&gt; version from the dropdown.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What’s Next
&lt;/h2&gt;

&lt;p&gt;I took a break from a more complex project for a day to see if I could finish a smaller one. And even though there is still more work to do here, I think it was a success and an excellent test. I will continue to explore new ways of making the “vibe coding” process go smoother. In the couple of days that it took me to write this, I found even more tools to help.&lt;/p&gt;

&lt;p&gt;Because I have no end of software ideas I have been collecting, but never got to, because I never had the time. So I will continue to work on my main vibe-coding project while taking a break to test new ways of doing this on smaller projects. My next post should be about the next set of things I learned &lt;a href="https://dev.to/eristoddle/building-an-electron-app-from-scratch-with-claude-code-5c03"&gt;developing an Electron app with Claude Code&lt;/a&gt;. I already have notes on it, but wanted to try this first.&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>obsidian</category>
      <category>javascript</category>
      <category>python</category>
    </item>
    <item>
      <title>Building an Electron App from Scratch with Claude Code</title>
      <dc:creator>Stephan Miller</dc:creator>
      <pubDate>Tue, 20 May 2025 12:00:00 +0000</pubDate>
      <link>https://forem.com/eristoddle/building-an-electron-app-from-scratch-with-claude-code-5c03</link>
      <guid>https://forem.com/eristoddle/building-an-electron-app-from-scratch-with-claude-code-5c03</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fey98j5g9kkhxw3u8rn9v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fey98j5g9kkhxw3u8rn9v.png" alt="Image description" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have been curious about “vibe coding” ever since I heard about the process, even though I hate the name. I started by &lt;a href="https://dev.to/eristoddle/claude-mcps-vibe-coding-without-specialized-ides-part-1-1hmd"&gt;using Claude Desktop with MCP tools&lt;/a&gt; to build a code project from scratch and it felt like magic. That was until I discovered Claude Code and my workflow got so much easier and less chaotic right away.&lt;/p&gt;

&lt;p&gt;This post documents my adventure so far, using Claude Code to build an Electron writing app. I made a few mistakes and am still making them, but corrected them and think I am on a better path now. I doubt it’s the right path, but I’ll keep tweaking it until it works for me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Vibe Coding with Claude Desktop and MCP&lt;/li&gt;
&lt;li&gt;Features of Claude Code that Made Me Switch&lt;/li&gt;
&lt;li&gt;
Installing and Using Claude Code

&lt;ul&gt;
&lt;li&gt;Requirements&lt;/li&gt;
&lt;li&gt;Installing and Initializing&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Building a Project from Scratch with Claude Code&lt;/li&gt;

&lt;li&gt;The Pain of Refactoring with Claude Code&lt;/li&gt;

&lt;li&gt;

My Current &lt;code&gt;CLAUDE.md&lt;/code&gt; file and Coding Process

&lt;ul&gt;
&lt;li&gt;How to Use This File&lt;/li&gt;
&lt;li&gt;Project Overview&lt;/li&gt;
&lt;li&gt;Key Requirements&lt;/li&gt;
&lt;li&gt;Important References&lt;/li&gt;
&lt;li&gt;Code Style Guidelines&lt;/li&gt;
&lt;li&gt;Project SDLC&lt;/li&gt;
&lt;li&gt;Build/Lint/Test Commands&lt;/li&gt;
&lt;li&gt;Development Status&lt;/li&gt;
&lt;li&gt;Next Development Steps&lt;/li&gt;
&lt;li&gt;QA Checklist&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Lessons I Learned Using Claude Code&lt;/li&gt;

&lt;li&gt;Features of Claude Code I Haven’t Tried Yet&lt;/li&gt;

&lt;li&gt;Claude Code Tips From Others I Plan on Trying&lt;/li&gt;

&lt;li&gt;

What Claude Code Built: Screenshots of My app

&lt;ul&gt;
&lt;li&gt;Dashboard&lt;/li&gt;
&lt;li&gt;Settings&lt;/li&gt;
&lt;li&gt;Writing Interface&lt;/li&gt;
&lt;li&gt;Character Relationship Graph&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

My Journey with Claude Code Continues

&lt;ul&gt;
&lt;li&gt;What I’ve Learned&lt;/li&gt;
&lt;li&gt;What’s Next&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Vibe Coding with Claude Desktop and MCP
&lt;/h2&gt;

&lt;p&gt;When I was &lt;a href="https://dev.to/eristoddle/claude-mcps-vibe-coding-without-specialized-ides-part-1-1hmd"&gt;using Claude Desktop to write code&lt;/a&gt;, I always felt like I was using workarounds to plan and organize. I started out with a standalone chat, but eventually got cut off and had to start a new chat to continue working on the app. So I switched to using a Claude Project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/eristoddle/vibe-coding-with-claude-desktop-and-mcp-part-2-switching-to-scrapy-elb"&gt;Working on the app in a Claude Project&lt;/a&gt; made things a little easier by allowing me to upload 50 reference documents, but that was a mess, because the project was always changing and I had to delete and re-upload changes to documentation all the time. I could have also tried connected to the project’s Github repo or simply putting the documentation in a folder in the local project.&lt;/p&gt;

&lt;p&gt;But around that time, I ran into &lt;a href="https://docs.anthropic.com/en/docs/claude-code/overview" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, read up on it, and figured it would work better for what I needed, so I started a new project with it. I might go back to the first project with a better plan, because I was making progress.&lt;/p&gt;

&lt;h2&gt;
  
  
  Features of Claude Code that Made Me Switch
&lt;/h2&gt;

&lt;p&gt;I am using to a &lt;strong&gt;command-line interface&lt;/strong&gt;. I use it them all the time. It feels like I am doing work. In chat, I feel like I am getting nagged by a sales bot on an e-commerce site or arguing with customer service. I get it makes no sense, but the command-line interface was a big feature for me.&lt;/p&gt;

&lt;p&gt;Another significant feature is the &lt;strong&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; file&lt;/strong&gt; , which Claude Code which holds your project memory. You can add things to it, like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build/Lint/Test Commands&lt;/li&gt;
&lt;li&gt;Code Style Guidelines&lt;/li&gt;
&lt;li&gt;The SDLC process you want to use&lt;/li&gt;
&lt;li&gt;Reference other documentation on the project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means I can keep its memory in the project, instead of project files. When I first started working with Claude, I started setting up MCP connections like I did with Desktop. Gut then realized I was wasting my time and adding overhead, because &lt;strong&gt;the functionality I was getting by adding MCPs to Claude Desktop was already built into Code&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I could see adding a database connection MCP in the future if I work on a project that has one, but right now I am not using any with Claude Code. Claude Code has a lot more features than the three I listed here and I will get to them later, but these were all the reasons I needed to make the switch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing and Using Claude Code
&lt;/h2&gt;

&lt;p&gt;I have to admit that I basically ran the code to install Claude Code, ran the init command in a new project folder, and starting playing with it. Writing this, I figured I should dig deeper into its documentation to make sure I missed nothing. And I missed a few things that might have made this entire process smoother.&lt;/p&gt;

&lt;h3&gt;
  
  
  Requirements
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Operating Systems&lt;/strong&gt; : macOS 10.15+, Ubuntu 20.04+/Debian 10+, or Windows via WSL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardware&lt;/strong&gt; : 4GB RAM minimum&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Software&lt;/strong&gt; : 

&lt;ul&gt;
&lt;li&gt;Node.js 18+&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://git-scm.com/downloads" rel="noopener noreferrer"&gt;git&lt;/a&gt; 2.23+ (optional)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cli.github.com/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; or &lt;a href="https://gitlab.com/gitlab-org/cli" rel="noopener noreferrer"&gt;GitLab&lt;/a&gt; CLI for PR workflows (optional)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/BurntSushi/ripgrep?tab=readme-ov-file#installation" rel="noopener noreferrer"&gt;ripgrep&lt;/a&gt; (rg) for enhanced file search (optional)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;One thing I missed there was installing &lt;code&gt;ripgrep&lt;/code&gt; for advanced search. Maybe that will make things quicker? I did not install it until I started writing this, so maybe that is why things seem to take longer than they needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing and Initializing
&lt;/h3&gt;

&lt;p&gt;To get started, run this command to install Claude Code globally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm install -g @anthropic-ai/claude-code

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then navigate to your project’s folder and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After Claude Code started, I ran the following command to generate the &lt;code&gt;CLAUDE.md&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/init

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Building a Project from Scratch with Claude Code
&lt;/h2&gt;

&lt;p&gt;I wasn’t sure I could do this because all the examples online were working with existing codebases and I wanted to start one from scratch. So I asked Claude, and it said I could.&lt;/p&gt;

&lt;p&gt;And I decided I would build an Electron app. A writing app. Yeah, I know. Like we need another one of those. But it was something I had always wanted to do and have been collecting a list of features over the years, which only made the odds I would ever get to it worse.&lt;/p&gt;

&lt;p&gt;And, of course, you can’t make a writing app these days without including AI, so I am using AI to write an AI writing app. And with all the middlemen in the AI world, I figured I would make it so you just enter your LLM API keys and get started. The app would be completely self-contained in an Electron desktop app. For now, at least.&lt;/p&gt;

&lt;p&gt;But if I were going to start over, I would start with more of a plan. I basically pointed Claude Code to the features of other similar apps and told it I wanted to build an Electron app with similar features.&lt;/p&gt;

&lt;p&gt;As I chatted with Claude Code and worked on the app for the first couple of hours, I just let it update &lt;code&gt;CLAUDE.md&lt;/code&gt; with the project details.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pain of Refactoring with Claude Code
&lt;/h2&gt;

&lt;p&gt;I was learning my mistakes as I went. The one I learned quickly was to not only plan features, but plan the architecture that the features will work in. Think ahead. By the time I had an app where some features were working enough to be useful, I decided I should have used a plugin architecture.&lt;/p&gt;

&lt;p&gt;I realized this architecture would work better for what I was doing when I was adding a context menu for the editor. When you right click in an editor, there are many things you can have it do, but when you add a new action, you are basically just adding another item to that menu that runs a new function when it is clicked.&lt;/p&gt;

&lt;p&gt;Now I also think this type of architecture could be useful when using Claude Code to create an app like this. Have it build the core system and add a well-defined interface for plugins to interact with the core. Then build all the features as plugins. I think that this would keep the context lighter because Claude could use the plugin API docs to build the features instead of searching and parsing everything all the time.&lt;/p&gt;

&lt;p&gt;But moving toward an architecture like that later in the game is hard. My first attempt to globally change things in the project was successful, but it took quite a while. And I still have a way to go. So, currently I am looking through the code base to see what all has to change for the complete refactor to happen. That way, when I tell it to refactor, I can give it some guidance and know what to expect when it does it right.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Current &lt;code&gt;CLAUDE.md&lt;/code&gt; file and Coding Process
&lt;/h2&gt;

&lt;p&gt;For the first few hours, I just told Claude Code to build new features and keep track of where it was at in the &lt;code&gt;CLAUDE.md&lt;/code&gt; file. But after a while, I knew I had to rein it in. Here are the sections I have in my &lt;code&gt;CLAUDE.md&lt;/code&gt; file currently, which has changed often and will continue to:&lt;/p&gt;

&lt;h3&gt;
  
  
  How to Use This File
&lt;/h3&gt;

&lt;p&gt;When I decided to refactor the &lt;code&gt;CLAUDE.md&lt;/code&gt;, I had some ideas, but also asked Claude Code what would work. It suggested this section. Here’s what it has in it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;This file contains essential project information&lt;/li&gt;
&lt;li&gt;Detailed documentation can be found in referenced files&lt;/li&gt;
&lt;li&gt;Update both this file and referenced files when making changes&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Project Overview
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;EmberText is an AI-powered desktop writing application built with Electron, React, and TypeScript. It combines features from TypingMind, NovelCrafter, SudoWrite, and Scrivener.&lt;/p&gt;
&lt;h3&gt;
  
  
  Key Requirements
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Projects must be stored as markdown files in a folder structure (similar to Obsidian)&lt;/li&gt;
&lt;li&gt;Each chapter should be saved as an individual markdown file&lt;/li&gt;
&lt;li&gt;Project metadata and structure should be stored in a JSON file within the project directory&lt;/li&gt;
&lt;li&gt;The app should be able to open and edit these files directly, enabling compatibility with other markdown editors&lt;/li&gt;
&lt;li&gt;Help a writer complete a project with AI or other tools from idea to ebook publishing&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Important References
&lt;/h3&gt;

&lt;p&gt;I had everything in my &lt;code&gt;CLAUDE.md&lt;/code&gt; file to begin with, but it quickly got huge. So I thought about how to trim it down. I also asked Claude Code what should be in the file versus what it could access on an as-needed basis. There is the concept of &lt;a href="https://docs.anthropic.com/en/docs/claude-code/memory#claude-md-imports" rel="noopener noreferrer"&gt;imports&lt;/a&gt;, but I didn’t want to use that because I would essentially do the same thing, but putting this massive amount of information in multiple files that still got read every time I ran Claude Code.&lt;/p&gt;

&lt;p&gt;So I created a project knowledge base folder and put everything that it wouldn’t need every time there and add an “important references” section in &lt;code&gt;CLAUDE.md&lt;/code&gt;. Here’s what I have in this section:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Code Project Structure: knowledge_base/project/structure.md&lt;/li&gt;
&lt;li&gt;Writing Project Structure: knowledge_base/project/directory-structure.md&lt;/li&gt;
&lt;li&gt;Key Features: knowledge_base/project/key-features.md&lt;/li&gt;
&lt;li&gt;Potential Feature Ideas: knowledge_base/feature-ideas&lt;/li&gt;
&lt;li&gt;Features Ready for Development or Already Developed: knowledge_base/features&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Code Style Guidelines
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Formatting&lt;/strong&gt; : Use 2-space indentation, single quotes, no semicolons&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Imports&lt;/strong&gt; : Group imports (node builtins, external, internal)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Types&lt;/strong&gt; : Use TypeScript with explicit return types on functions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Naming&lt;/strong&gt; : camelCase for variables/functions, PascalCase for classes/components&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Components&lt;/strong&gt; : One component per file, match filename to component name&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error Handling&lt;/strong&gt; : Use try/catch blocks with specific error types&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Functions&lt;/strong&gt; : Prefer pure functions, avoid side effects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Testing&lt;/strong&gt; : Write unit tests for all new functionality&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Project SDLC
&lt;/h3&gt;

&lt;p&gt;As I was working on the project, I decided to “formalize” the development process, so that Claude Code knew what steps I wanted it to take. Here is what I have in this section:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;New features will be picked from “Key Features” in knowledge_base/project/key-features.md when “Next development steps” is empty. Once those are done, they will be sourced from knowledge_base/feature-ideas.&lt;/li&gt;
&lt;li&gt;I will ask you to think about the features and you will create notes about the feature in a markdown file named after it in knowledge_base/features. This file will be used as reference when we update a feature or fix bugs. The file will be kept updated with the feature changes.&lt;/li&gt;
&lt;li&gt;You will then reference the file next to the feature in “Next development steps” like this: “Add workshop chat: knowledge_base/features/AddWorkShopChat.md”&lt;/li&gt;
&lt;li&gt;I will ask you to do the next task in “Next development steps”. You will work on it and tell me when it is done, asking any necessary questions along the way.&lt;/li&gt;
&lt;li&gt;When that is done, you will move the feature from “Next development steps” to “Current progress” and make sure to move the reference to the note in knowledge_base/features to it.&lt;/li&gt;
&lt;li&gt;You will also update “Key Features” in knowledge_base/project/key-features.md if anything changed there.&lt;/li&gt;
&lt;li&gt;Update the project structure in knowledge_base/project/structure.md and writing project structure in knowledge_base/project/directory-structure.md when necessary.&lt;/li&gt;
&lt;li&gt;You will add testing the feature to “QA Checklist”&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Build/Lint/Test Commands
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Install: &lt;code&gt;npm install&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Start: &lt;code&gt;npm start&lt;/code&gt; (builds main and renderer processes and starts Electron)&lt;/li&gt;
&lt;li&gt;Dev: &lt;code&gt;npm run dev&lt;/code&gt; (development mode with live reload)&lt;/li&gt;
&lt;li&gt;Build Main: &lt;code&gt;npm run build:main&lt;/code&gt; (TypeScript compilation for main process)&lt;/li&gt;
&lt;li&gt;Build Renderer: &lt;code&gt;npm run build:renderer&lt;/code&gt; (Webpack bundling for renderer)&lt;/li&gt;
&lt;li&gt;Test: &lt;code&gt;npm test&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Lint: &lt;code&gt;npm run lint&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Format: &lt;code&gt;npm run format&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Development Status
&lt;/h3&gt;

&lt;p&gt;This is a list of features it has already implemented. Just a simple description of the feature and the path to the feature plan file. More details on the feature plan file below.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Development Steps
&lt;/h3&gt;

&lt;p&gt;This is a list of features I told it to put here. I usually have 3-4 features in this list. When I tell it to add a feature, I have it “think hard” about how to implement it, create a file in my project knowledge base folder that details how the feature will be implemented, and add it to this list like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implement scene scaffolding system: knowledge_base/features/SceneScaffolding.md&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives me a chance to look it over before it implements it. I am also hoping it follows my instructions and only references those files when dealing with that specific feature to keep unnecessary information out of the context. The files it creates are pretty extensive and have the following headings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Overview&lt;/li&gt;
&lt;li&gt;Key Requirements&lt;/li&gt;
&lt;li&gt;User Interface Components&lt;/li&gt;
&lt;li&gt;Data Structures&lt;/li&gt;
&lt;li&gt;Components to Create&lt;/li&gt;
&lt;li&gt;AI Integration&lt;/li&gt;
&lt;li&gt;User Experience Flow&lt;/li&gt;
&lt;li&gt;Integration with Existing Features&lt;/li&gt;
&lt;li&gt;Implementation Plan&lt;/li&gt;
&lt;li&gt;Dependency Notes&lt;/li&gt;
&lt;li&gt;Testing Requirements&lt;/li&gt;
&lt;li&gt;Future Enhancements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  QA Checklist
&lt;/h3&gt;

&lt;p&gt;Claude Code will add items to this list when it has finished a feature. It’s actually a reminder for me to test new features, so it probably doesn’t need to be in the file. For now, though, it keeps me from forgetting to test things and it doesn’t take that much space.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons I Learned Using Claude Code
&lt;/h2&gt;

&lt;p&gt;I learned most of these lessons after making many mistakes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Plan well:&lt;/strong&gt; This is a lesson I keep learning over and over. But I still jumped right into this project without much of a plan. All I had was Electron and a list of features. I should have thought more about architecture at the beginning instead of refactoring a few hours in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stop and look at the project holistically often:&lt;/strong&gt; Especially if you don’t plan well. I knew I wanted AI to help write a book at every step, from idea to ebook, but I sort of just let it do what it wanted. And by the time I realized the AI functionality could be a modal used in multiple places, it put AI forms in three or four different places.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tell Claude Code to “think harder” when you are planning feature:&lt;/strong&gt; This is the third lesson that involves “planning.” Notice a pattern. When you tell Claude to “think”, it triggers &lt;a href="https://docs.anthropic.com/en/docs/claude-code/tutorials#use-extended-thinking" rel="noopener noreferrer"&gt;Claude’s extending thinking&lt;/a&gt; and when you tell it to “think harder”, “think a lot”, or “think more”, it triggers deeper thinking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do your own git commits:&lt;/strong&gt; I didn’t think that Claude Code would commit code without explicitly telling it to do so, but now and then it would. And each time, it was when I wanted to be done for the night and, of course, the code was broken. So I spent more time having it fix the code, so that I could commit working code. I could have reverted, but it had just added a new feature, which I knew was at least partially working.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QA and review the UI often:&lt;/strong&gt; Claude will forget things. It will think hard about a feature, create a plan, work on it, tell you it is done, and forget 20% of the features. It was really hard for me to stop adding shiny new things to the app to do mundane testing, but I eventually had to backtrack through a lot of things and fix bugs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Features of Claude Code I Haven’t Tried Yet
&lt;/h2&gt;

&lt;p&gt;I jumped right into this project without reading 95% of &lt;a href="https://docs.anthropic.com/en/docs/claude-code/overview" rel="noopener noreferrer"&gt;the docs&lt;/a&gt;. But I figured I should at least peruse them before I wrote this. Here are some things I ran into, not covered by this article, you might want to check out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free Claude Code with Claude Max:&lt;/strong&gt; I may or may not try this in the future. If you pay for the $100/month Claude Max subscription, Claude Code comes along with it. In my estimation, when I am working with Claude Code, it costs me around $5 an hour and with my limited time, I have only used about $50/month so far with the API. Maybe with &lt;code&gt;ripgrep&lt;/code&gt; installed, the process will go faster and cost me more per hour.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.anthropic.com/en/docs/claude-code/tutorials#use-claude-code-as-an-mcp-server" rel="noopener noreferrer"&gt;Use Claude Code as an MCP server&lt;/a&gt;:&lt;/strong&gt; Thought this was cool, but then wasn’t sure what I would use it for.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.anthropic.com/en/docs/claude-code/tutorials#create-custom-slash-commands" rel="noopener noreferrer"&gt;Create custom slash commands&lt;/a&gt;:&lt;/strong&gt; You can create a &lt;code&gt;.claude/commands&lt;/code&gt; folder in your project to hold commands you use over and over. You can also design these commands to accept arguments.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Claude Code Tips From Others I Plan on Trying
&lt;/h2&gt;

&lt;p&gt;I work on projects like this and write articles about them in the evenings when I don’t have freelance writing work and in the last two weeks I haven’t been able to touch this project. But I still looked for tips whenever I could and took notes on them. Here are some things I plan on trying in the future:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Manage compacting more effectively:&lt;/strong&gt; I noticed that Claude Code mentioning something about compacting context and providing a percentage. Now, with further research, I’ve learned that Claude Code will compress the conversation history to stay within the context window’s limits. I found this &lt;a href="https://www.reddit.com/r/ClaudeAI/comments/1ko5pxk/claude_code_is_a_beast_tips_from_a_week_of/" rel="noopener noreferrer"&gt;post on Reddit&lt;/a&gt; about always doing compacting manually. For example, before you are about to start on a new feature. Because, I guess, auto compacting can really screw you if it happens at the wrong time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leverage other LLMs and &lt;a href="https://repomix.com/" rel="noopener noreferrer"&gt;repomix&lt;/a&gt; or &lt;a href="https://github.com/mufeedvh/code2prompt" rel="noopener noreferrer"&gt;code2prompt&lt;/a&gt;:&lt;/strong&gt; The refactoring process was a pain. Fixing bugs was a pain. &lt;a href="https://www.reddit.com/r/ClaudeAI/comments/1kkatqk/comment/mrtsfn2/?utm_source=share&amp;amp;utm_medium=web3x&amp;amp;utm_name=web3xcss&amp;amp;utm_term=1&amp;amp;utm_content=share_button" rel="noopener noreferrer"&gt;Redditor squareboxrox mentioned&lt;/a&gt; using these tools to upload your whole project to another LLM when you run into the inevitable bug fix rabbit hole, instead of saying “still not fixed” over and over. Not every bug is like this. Maybe 20%. But I figured I’d give this method a try&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Claude Code Built: Screenshots of My app
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Disclaimer:&lt;/strong&gt; These screenshots showcase the functional aspects of EmberText rather than polished design. As this project focused on testing Claude Code’s capabilities for rapidly building functional features, I prioritized getting core functionality working over UI refinement. Consider these a “developer preview” of what’s possible with AI-assisted coding.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Dashboard
&lt;/h3&gt;

&lt;p&gt;The home screen showing multiple writing projects with their structure and metadata. Claude Code implemented the card-based layout and project metadata tracking.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqatg649thtn2uvi871dw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqatg649thtn2uvi871dw.png" alt="embertext-workbench" width="800" height="631"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Settings
&lt;/h3&gt;

&lt;p&gt;Settings panel for connecting to various AI models. This shows how Claude Code implemented form validation and API connection management.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3l5b8680mz5gism3vnj3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3l5b8680mz5gism3vnj3.png" alt="embertext-settings" width="800" height="690"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Writing Interface
&lt;/h3&gt;

&lt;p&gt;The main editor with project structure sidebar and formatting tools. Note the line-numbered editor and structure navigation that Claude Code implemented based on my requirements.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ha0dotuo887ok6513r6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ha0dotuo887ok6513r6.png" alt="embertext-project" width="800" height="642"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Character Relationship Graph
&lt;/h3&gt;

&lt;p&gt;An interactive visualization showing connections between characters. This feature shows Claude Code’s ability to integrate complex visualizations using React.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffbpolr5n8x8qjgmzysej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffbpolr5n8x8qjgmzysej.png" alt="embertext-relationships" width="800" height="746"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’m impressed by how quickly Claude Code could assemble a functional writing application with complex features like relationship graphs and structured document editing. While the UI would benefit from refinement, the core functionality is there and working as intended after just 16 hours of development time.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Journey with Claude Code Continues
&lt;/h2&gt;

&lt;p&gt;After spending about 16 hours and $80 in API costs, I’ve built a writing application with some functionality. My app currently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connects to Claude and Open API&lt;/li&gt;
&lt;li&gt;Generates text and dialogue&lt;/li&gt;
&lt;li&gt;Creates outline and plot structures&lt;/li&gt;
&lt;li&gt;Has a plot timeline&lt;/li&gt;
&lt;li&gt;Has a character relationship graph&lt;/li&gt;
&lt;li&gt;Adds location, character, and item context to API calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The refactoring process I described earlier was painful enough that I have temporarily paused development to reorganize my project memory files and carefully plan upcoming features. It has also reinforced the biggest lesson I learned: &lt;strong&gt;planning saves significant time and frustration&lt;/strong&gt; with AI-assisted development.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I’ve Learned
&lt;/h3&gt;

&lt;p&gt;The most valuable lesson from this experiment is that Claude Code does well when given clear direction and architecture upfront. While it can adapt on the fly, major architectural shifts are still painful, much like traditional development, just faster.&lt;/p&gt;

&lt;p&gt;My approach now combines the benefits of AI coding with disciplined software design: I plan thoroughly, create detailed feature specifications, and only then engage Claude Code to implement them.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s Next
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Complete my writing app&lt;/strong&gt; : Once I’ve implemented the remaining AI helpers, I’ll test the application with a real book project. Since I built this tool primarily for myself, this test will be the accurate measure of success.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explore smaller projects&lt;/strong&gt; : The development speed has inspired me to tackle several smaller ideas in parallel. With proper planning, I could potentially complete 2-3 modest apps in the same time it would typically take to build one.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you’re interested in trying Claude Code yourself, I recommend starting with something small and well-defined. The initial time investment in proper planning and documentation pays dividends in development speed and reduces frustrating rework.&lt;/p&gt;

&lt;p&gt;I’ll be documenting more of this journey as I continue exploring AI-assisted development.&lt;/p&gt;

</description>
      <category>llmcoding</category>
      <category>vibecoding</category>
      <category>programming</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
