<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: signalscout</title>
    <description>The latest articles on Forem by signalscout (@vonb).</description>
    <link>https://forem.com/vonb</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3866545%2F16137258-2483-4b38-afec-c57eac71d39c.png</url>
      <title>Forem: signalscout</title>
      <link>https://forem.com/vonb</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/vonb"/>
    <language>en</language>
    <item>
      <title>Stop Turning On “Think Harder” For Everything</title>
      <dc:creator>signalscout</dc:creator>
      <pubDate>Wed, 29 Apr 2026 02:33:17 +0000</pubDate>
      <link>https://forem.com/vonb/stop-turning-on-think-harder-for-everything-2f0c</link>
      <guid>https://forem.com/vonb/stop-turning-on-think-harder-for-everything-2f0c</guid>
      <description>&lt;h1&gt;
  
  
  Stop Turning On “Think Harder” For Everything
&lt;/h1&gt;

&lt;p&gt;Most people using AI tools leave reasoning mode on because it feels safer.&lt;/p&gt;

&lt;p&gt;The button says the model will think more. Why would you not want that?&lt;/p&gt;

&lt;p&gt;Because most of the work you are asking an AI to do does not require more thinking. It requires cleaner execution.&lt;/p&gt;

&lt;p&gt;If you are vibe-coding, building landing pages, fixing obvious bugs, writing emails, creating content, or asking an agent to make a straightforward change, “think harder” often makes the output worse.&lt;/p&gt;

&lt;p&gt;Not just slower.&lt;/p&gt;

&lt;p&gt;Worse.&lt;/p&gt;

&lt;p&gt;The model starts hedging. It invents edge cases. It explains tradeoffs you did not ask for. It turns “make this button work” into a small architecture review.&lt;/p&gt;

&lt;p&gt;You asked it to ship.&lt;/p&gt;

&lt;p&gt;It gave you a committee meeting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Execution vs Judgment
&lt;/h2&gt;

&lt;p&gt;This is the split that matters.&lt;/p&gt;

&lt;p&gt;Some tasks are execution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;build this page,&lt;/li&gt;
&lt;li&gt;clean this CSS,&lt;/li&gt;
&lt;li&gt;turn this note into an email,&lt;/li&gt;
&lt;li&gt;fix the typo,&lt;/li&gt;
&lt;li&gt;format this JSON,&lt;/li&gt;
&lt;li&gt;make the navbar responsive,&lt;/li&gt;
&lt;li&gt;write the obvious test,&lt;/li&gt;
&lt;li&gt;deploy this project.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For those, you usually want fast mode. Low reasoning. Direct instructions. Small context. Run it, inspect it, fix what broke.&lt;/p&gt;

&lt;p&gt;Other tasks are judgment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;choose between two architectures,&lt;/li&gt;
&lt;li&gt;debug a weird failure with no obvious cause,&lt;/li&gt;
&lt;li&gt;analyze a security issue,&lt;/li&gt;
&lt;li&gt;decide product positioning,&lt;/li&gt;
&lt;li&gt;plan a migration,&lt;/li&gt;
&lt;li&gt;compare models or vendors,&lt;/li&gt;
&lt;li&gt;reason through a messy business tradeoff.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For those, thinking is the product. Pay for it. Let the model slow down.&lt;/p&gt;

&lt;p&gt;The mistake is treating every request like judgment.&lt;/p&gt;

&lt;p&gt;Most work is not judgment.&lt;/p&gt;

&lt;p&gt;Most work is just work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters More With Agents
&lt;/h2&gt;

&lt;p&gt;When you are chatting with a model manually, wasting one expensive request is annoying.&lt;/p&gt;

&lt;p&gt;When you are using an agent, one instruction can become ten requests.&lt;/p&gt;

&lt;p&gt;The agent reads files. Calls tools. Runs commands. Sees an error. Tries again. Summarizes. Calls another model. Writes a file. Checks the diff. Replies.&lt;/p&gt;

&lt;p&gt;If every one of those calls is using maximum reasoning, you are paying a thinking tax on operations that do not need it.&lt;/p&gt;

&lt;p&gt;That is how people end up feeling like AI tools are too expensive even though the model did exactly what they asked.&lt;/p&gt;

&lt;p&gt;The workflow was routed wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Vibe-Coder Rule
&lt;/h2&gt;

&lt;p&gt;Use this rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you can tell whether the output is right by looking at it, use low reasoning.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the button works, the button works.&lt;/p&gt;

&lt;p&gt;If the email sounds good, it sounds good.&lt;/p&gt;

&lt;p&gt;If the page builds, the page builds.&lt;/p&gt;

&lt;p&gt;You do not need a model to spend 45 seconds reasoning before changing a color, extracting a list, or adding a route.&lt;/p&gt;

&lt;p&gt;Use high reasoning when you cannot easily verify the answer yourself, or when the cost of being wrong is high.&lt;/p&gt;

&lt;p&gt;That includes security, money, production migrations, ambiguous architecture, legal/compliance, and anything where the model needs to reject several plausible options before choosing one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Better Workflow
&lt;/h2&gt;

&lt;p&gt;Here is the workflow I use now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start cheap and direct.&lt;/li&gt;
&lt;li&gt;Give the model only the context it needs.&lt;/li&gt;
&lt;li&gt;Make it produce an artifact.&lt;/li&gt;
&lt;li&gt;Run the artifact.&lt;/li&gt;
&lt;li&gt;If it fails, feed back the exact failure.&lt;/li&gt;
&lt;li&gt;Escalate reasoning only when the failure is confusing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That loop beats “think hard forever” for most real building.&lt;/p&gt;

&lt;p&gt;It is faster, cheaper, and less annoying.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Point
&lt;/h2&gt;

&lt;p&gt;AI tools are becoming less about picking the smartest model and more about routing work correctly.&lt;/p&gt;

&lt;p&gt;A great builder does not ask the biggest model to do everything.&lt;/p&gt;

&lt;p&gt;A great builder knows when the task needs judgment and when it needs momentum.&lt;/p&gt;

&lt;p&gt;If you are learning by doing, momentum matters.&lt;/p&gt;

&lt;p&gt;Turn thinking down. Ship the thing. Look at what broke. Then decide if it needs a smarter pass.&lt;/p&gt;

&lt;p&gt;Most of the time, it does not.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>productivity</category>
      <category>programming</category>
    </item>
    <item>
      <title>GitHub Copilot Changed the Deal. That Is the Whole Lesson.</title>
      <dc:creator>signalscout</dc:creator>
      <pubDate>Wed, 29 Apr 2026 02:33:11 +0000</pubDate>
      <link>https://forem.com/vonb/github-copilot-changed-the-deal-that-is-the-whole-lesson-47d3</link>
      <guid>https://forem.com/vonb/github-copilot-changed-the-deal-that-is-the-whole-lesson-47d3</guid>
      <description>&lt;h1&gt;
  
  
  GitHub Copilot Changed the Deal. That Is the Whole Lesson.
&lt;/h1&gt;

&lt;p&gt;GitHub Copilot Pro+ used to feel like a cheat code.&lt;/p&gt;

&lt;p&gt;For $40/month, you could get access to models that would have cost meaningfully more if you were paying direct API prices. Not because you discovered some genius hack. Because subscriptions and APIs are different economic products.&lt;/p&gt;

&lt;p&gt;A subscription gives you a ceiling.&lt;/p&gt;

&lt;p&gt;An API gives you a meter.&lt;/p&gt;

&lt;p&gt;If you are building with agents, that difference matters more than almost anything else.&lt;/p&gt;

&lt;p&gt;I learned this the dumb way.&lt;/p&gt;

&lt;p&gt;I run OpenClaw, a local agent orchestration setup that lets me route tasks through different models and tools. I use it to build sites, write code, audit projects, post content, handle email, and generally turn messy ideas into artifacts.&lt;/p&gt;

&lt;p&gt;It is powerful.&lt;/p&gt;

&lt;p&gt;It is also very easy to use wrong.&lt;/p&gt;

&lt;p&gt;One bad session can quietly turn four prompts into dozens of model calls. Not because the model is bad. Because the agent is carrying too much context, switching tasks midstream, calling tools repeatedly, retrying failures, and dragging stale memory into every request.&lt;/p&gt;

&lt;p&gt;At one point, the math looked like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;12,000-ish tokens × 37 calls for what felt like a few prompts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is not intelligence.&lt;/p&gt;

&lt;p&gt;That is a context leak with a nice chat interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Copilot Pro+ Felt So Good
&lt;/h2&gt;

&lt;p&gt;The original Copilot Pro+ value proposition was not just “you get Claude / GPT / Gemini in your editor.”&lt;/p&gt;

&lt;p&gt;The real value was insulation.&lt;/p&gt;

&lt;p&gt;With direct API credits, every mistake has a price. Every oversized context window. Every retry loop. Every “actually, now switch tasks and use the same session to debug this other thing.” Every time your agent re-sends the same irrelevant history because you forgot to clear the session.&lt;/p&gt;

&lt;p&gt;With a subscription, the downside is bounded. You might hit a limit. You might get slowed down. But you do not wake up to a surprise bill because your agent got confused at 2am.&lt;/p&gt;

&lt;p&gt;That is why Copilot Pro+ felt absurdly good for agentic work. It was not just cheaper access. It was emotional safety.&lt;/p&gt;

&lt;p&gt;You could learn by doing.&lt;/p&gt;

&lt;p&gt;You could vibe-code without feeling like every mistake was financially metered.&lt;/p&gt;

&lt;p&gt;That matters. A lot.&lt;/p&gt;

&lt;p&gt;The people learning fastest right now are not professional DevOps engineers with perfect usage dashboards. They are builders who try things, break things, paste errors back in, and keep going. A predictable subscription is perfect for that phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Then GitHub Changed the Business Model
&lt;/h2&gt;

&lt;p&gt;And honestly, of course they did.&lt;/p&gt;

&lt;p&gt;If a $40 subscription reliably gives heavy agent users more than $40 of model value, the platform eventually has to change the terms. GitHub has been moving Copilot toward premium request accounting and API-spend-style economics. The direction is clear: the more the product behaves like raw frontier-model infrastructure, the more the pricing has to look like usage.&lt;/p&gt;

&lt;p&gt;This is not a ban story. This is not “I got kicked off GitHub.”&lt;/p&gt;

&lt;p&gt;This is the boring reality of AI infrastructure: if users can turn subscriptions into uncapped agent compute, the subscription stops being sustainable.&lt;/p&gt;

&lt;p&gt;And that is the whole lesson.&lt;/p&gt;

&lt;p&gt;You cannot build your workflow around pricing loopholes.&lt;/p&gt;

&lt;p&gt;You need to fix the workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Beginner Mistake: Buying More Credits Instead of Managing Context
&lt;/h2&gt;

&lt;p&gt;When a vibe-coder runs out of credits, the instinct is usually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;buy more Anthropic API credits,&lt;/li&gt;
&lt;li&gt;try OpenRouter,&lt;/li&gt;
&lt;li&gt;buy Claude Code,&lt;/li&gt;
&lt;li&gt;upgrade ChatGPT,&lt;/li&gt;
&lt;li&gt;test another wrapper,&lt;/li&gt;
&lt;li&gt;chase a bigger context window.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I did all of that.&lt;/p&gt;

&lt;p&gt;OpenRouter with frontier models did not magically solve the problem. It was still API economics. If I sent too much context, I paid for too much context.&lt;/p&gt;

&lt;p&gt;Anthropic API was great when my setup broke and I had no other option. But it was expensive in exactly the way APIs are expensive: clean, metered, unforgiving.&lt;/p&gt;

&lt;p&gt;Claude Code is probably good. I have not used it enough to make a religious claim.&lt;/p&gt;

&lt;p&gt;After testing newer OpenAI and Anthropic models, I found myself preferring GPT-5.5 for a lot of my actual work. And yes, I am excited about 1M-token windows once I have my context system fixed.&lt;/p&gt;

&lt;p&gt;But bigger context does not solve sloppy context.&lt;/p&gt;

&lt;p&gt;A 1M-token window just lets you make a 1M-token mess.&lt;/p&gt;

&lt;h2&gt;
  
  
  What ContextClaw Is Really For
&lt;/h2&gt;

&lt;p&gt;ContextClaw started as a cost-control tool. That is still true, but the better framing is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ContextClaw is a seatbelt for people who learn by doing with AI agents.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It does not try to make you a perfect engineer.&lt;/p&gt;

&lt;p&gt;It assumes you are going to do the normal builder thing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep a session open too long,&lt;/li&gt;
&lt;li&gt;switch tasks halfway through,&lt;/li&gt;
&lt;li&gt;paste a giant error log,&lt;/li&gt;
&lt;li&gt;forget what is already in memory,&lt;/li&gt;
&lt;li&gt;ask the agent to “also quickly do this,”&lt;/li&gt;
&lt;li&gt;and accidentally turn one workflow into five.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ContextClaw exists to make that survivable.&lt;/p&gt;

&lt;p&gt;It treats context like RAM, not a diary. Hot context should be small, relevant, and task-specific. Everything else belongs in files, memory, search, or cold storage.&lt;/p&gt;

&lt;p&gt;The simple rules are not glamorous:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;clear the session when the task changes,&lt;/li&gt;
&lt;li&gt;use skills instead of carrying giant instructions forever,&lt;/li&gt;
&lt;li&gt;write artifacts to files,&lt;/li&gt;
&lt;li&gt;summarize old work instead of replaying it,&lt;/li&gt;
&lt;li&gt;keep subagents isolated,&lt;/li&gt;
&lt;li&gt;do not make the main session remember every tool result,&lt;/li&gt;
&lt;li&gt;route cheap tasks to cheap models,&lt;/li&gt;
&lt;li&gt;save expensive models for judgment calls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is it.&lt;/p&gt;

&lt;p&gt;That is the “secret.”&lt;/p&gt;

&lt;p&gt;Not a magic prompt. Not a bigger subscription. Not a new model leaderboard.&lt;/p&gt;

&lt;p&gt;Just context discipline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Copilot Was the Backup. ContextClaw Is the Replacement Layer.
&lt;/h2&gt;

&lt;p&gt;The way I think about Copilot has changed.&lt;/p&gt;

&lt;p&gt;Originally, Copilot Pro+ was my cheap frontier-model pipe. Then it became my backup when API credits got painful. Then GitHub’s pricing shift made the real lesson obvious.&lt;/p&gt;

&lt;p&gt;Copilot’s hidden benefit was not only model access. It was that the wrapper absorbed complexity: caching, request shaping, context choices, editor state, and spend boundaries.&lt;/p&gt;

&lt;p&gt;ContextClaw is me trying to make that layer explicit.&lt;/p&gt;

&lt;p&gt;If OpenClaw is going to call models directly, it needs the same kind of insulation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;know what context matters,&lt;/li&gt;
&lt;li&gt;avoid resending stale junk,&lt;/li&gt;
&lt;li&gt;prevent accidental runaway sessions,&lt;/li&gt;
&lt;li&gt;make cost visible,&lt;/li&gt;
&lt;li&gt;and preserve the ability to learn by doing without making every mistake expensive.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the part most vibe-coders need more than another model subscription.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rule I Use Now
&lt;/h2&gt;

&lt;p&gt;If you are using OpenClaw and buying API credits, ask this before you top up:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Did I actually need more model, or did I just fail to manage context?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Most of the time, the answer is the second one.&lt;/p&gt;

&lt;p&gt;Run &lt;code&gt;/clear&lt;/code&gt; when the task changes.&lt;/p&gt;

&lt;p&gt;Write the durable stuff down.&lt;/p&gt;

&lt;p&gt;Use skills as modular instructions instead of carrying everything in one mega-prompt.&lt;/p&gt;

&lt;p&gt;Do not ask the same session to be your coder, marketer, therapist, deployment engineer, and memory database.&lt;/p&gt;

&lt;p&gt;And if your agent made 37 calls for four prompts, do not blame the model.&lt;/p&gt;

&lt;p&gt;You built a slot machine and connected it to a credit card.&lt;/p&gt;

&lt;p&gt;Fix the machine.&lt;/p&gt;

&lt;p&gt;Then buy the best model you can afford.&lt;/p&gt;

&lt;p&gt;That order matters.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>beginners</category>
      <category>devtools</category>
    </item>
    <item>
      <title>The Best $40 Addendum: I Tried 14 Copilot Subs and Custom Wrappers — Here's What Actually Works</title>
      <dc:creator>signalscout</dc:creator>
      <pubDate>Tue, 28 Apr 2026 05:41:45 +0000</pubDate>
      <link>https://forem.com/vonb/the-best-40-addendum-i-tried-14-copilot-subs-and-custom-wrappers-heres-what-actually-works-1cok</link>
      <guid>https://forem.com/vonb/the-best-40-addendum-i-tried-14-copilot-subs-and-custom-wrappers-heres-what-actually-works-1cok</guid>
      <description>&lt;h1&gt;
  
  
  The Best $40 Addendum: I Tried 14 Copilot Subs and Custom Wrappers — Here's What Actually Works
&lt;/h1&gt;

&lt;p&gt;After I published the original article, I got the expected responses: "what about running multiple accounts?" and "what about modifying the wrapper?" and "I heard you can fan out parallel agent threads and get way more output per dollar."&lt;/p&gt;

&lt;p&gt;I went down that rabbit hole for about six weeks. This is the honest debrief.&lt;/p&gt;

&lt;p&gt;Short version: the $39/month Copilot Pro+ recommendation still stands — but for a different reason than I originally thought. The tooling you layer on top matters more than the subscription math. And for 99% of developers, the answer to "what do I layer on top" is embarrassingly simple.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rabbit Hole, Documented
&lt;/h2&gt;

&lt;p&gt;Here's what I actually tested:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multiple Copilot accounts.&lt;/strong&gt; Yes, it's technically possible to run 14 GitHub accounts, each paying $39/month, and orchestrate them as parallel agent workers. I tried a version of this — not 14 accounts, but enough to see how the seams show. The first problem is GitHub's terms of service, which prohibit multiple personal accounts. The second problem is that coordinating parallel agent sessions that modify the same codebase is genuinely hard. Race conditions. Conflicting file states. Agents overwriting each other's work. You end up spending more time debugging the orchestration than you would have just doing the work linearly. The math that looks good on paper ($39 x N = N times the output) doesn't survive contact with reality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom wrapper modifications.&lt;/strong&gt; The Copilot extension exposes enough surface area that people have gotten creative — custom system prompts, context injection, session manipulation. I experimented here too. Some of it works. All of it is fragile. GitHub pushes extension updates frequently. Your custom modifications break. You spend an afternoon re-patching things instead of shipping code. The delta between "custom wrapper" and "off-the-shelf wrapper" shrinks to near zero in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallel agentic frameworks.&lt;/strong&gt; I run OpenClaw, which is a purpose-built multi-agent orchestration framework I've been building for a year. Running parallel sub-agents that each drive a separate model session is genuinely powerful — but only because the framework was purpose-built to handle state, file coordination, task decomposition, and agent lifecycle. Rolling your own version of this is a significant software project. It's not a weekend hack.&lt;/p&gt;

&lt;p&gt;The pattern across all three experiments: the idea sounds like leverage. The execution is overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  What People Are Actually Doing
&lt;/h2&gt;

&lt;p&gt;While I was in the rabbit hole, I also checked what the broader community was building.&lt;/p&gt;

&lt;p&gt;The open-source wrapper ecosystem has consolidated fast. Cline has 61,000+ GitHub stars. Aider is at 44,000+. Goose (from Block) is at 43,000+. Continue.dev at 32,000+. Roo-Code at 23,000+. These are not weekend projects — they're mature tools with thousands of real users. The community has voted with stars and pull requests.&lt;/p&gt;

&lt;p&gt;Notably, HackerNews has had multiple active threads comparing Claude Code vs. Codex CLI as the two beginner-facing options, with the community largely agreeing that these are the correct entry points. The debate isn't "which custom wrapper should I build" — it's "which purpose-built tool should I use."&lt;/p&gt;

&lt;p&gt;There's also Moltbook, which bills itself as "the front page of the agent internet" — a social network built for AI agents where agents share, discuss, and upvote content. It exists. People are building toward a world where agents are first-class actors. That future is coming. But it's not where you should start.&lt;/p&gt;

&lt;p&gt;The community consensus in 2026 is not "build your own agentic framework." It's "pick one of the three or four mature tools and actually ship something."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Beginner Recommendation
&lt;/h2&gt;

&lt;p&gt;If you're starting out, here's the decision tree:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Codex CLI&lt;/strong&gt; if you want the OpenAI model stack (GPT-4o, o3) in a clean terminal interface. It's $20/month on ChatGPT Plus, or you use API credits. It wraps the model well. It handles context, tool use, and multi-step tasks without you thinking about any of it. You open a terminal, you describe what you want, it does the work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Claude Code&lt;/strong&gt; if you want the Anthropic model stack (Claude Sonnet or Opus). Same idea — clean terminal interface, the model is wrapped correctly, you don't configure anything, you just use it. $20/month on Claude Pro gets you Claude Sonnet. Opus costs more but is rarely necessary for everyday development tasks.&lt;/p&gt;

&lt;p&gt;Both tools have been designed by teams who thought deeply about how to make a model useful in a development context. They handle context injection, multi-turn state, tool use, and error recovery. You don't have to think about any of it. That's the point.&lt;/p&gt;

&lt;p&gt;The Copilot Pro+ subscription I recommended in the original article is still the right answer for IDE-embedded access to frontier models. But for your agentic terminal workflow — the thing that actually runs tasks autonomously — pick Codex or Claude Code and stop there.&lt;/p&gt;

&lt;p&gt;Don't install Cline and Aider and Goose and Continue and Roo-Code and spend a week figuring out the difference. Pick one. Use it until it breaks for your use case. Then reassess.&lt;/p&gt;

&lt;h2&gt;
  
  
  The OpenClaw Caveat (And Why I Feel Weird Writing This)
&lt;/h2&gt;

&lt;p&gt;I'm aware that recommending OpenClaw in a beginner's article sounds like a founder pitching their own product. Let me be honest about when it's appropriate and when it isn't.&lt;/p&gt;

&lt;p&gt;OpenClaw makes sense if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're running multiple agents in parallel as a workflow, not just as a single developer working on a single task&lt;/li&gt;
&lt;li&gt;You need persistent state across sessions — memory systems, task queues, artifact tracking&lt;/li&gt;
&lt;li&gt;You want to route different tasks to different models based on cost and capability&lt;/li&gt;
&lt;li&gt;You're orchestrating coding agents, browser agents, and data-processing agents as a coordinated fleet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenClaw does not make sense if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want to write code faster&lt;/li&gt;
&lt;li&gt;You want a better autocomplete&lt;/li&gt;
&lt;li&gt;You're doing a single project and you just want help&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're in the second bucket — which is where most developers should be — Codex CLI or Claude Code will serve you completely. The overhead of setting up and maintaining a full orchestration framework is not worth it for individual developer productivity. I built OpenClaw because my workflow had genuinely outgrown the single-agent tools. That took time and experimentation to discover. I couldn't have known I needed it on day one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The CTA Is Simple
&lt;/h2&gt;

&lt;p&gt;Stick with the simple thing until the simple thing breaks.&lt;/p&gt;

&lt;p&gt;Start with Copilot Pro+ for IDE integration and model access. Add Codex CLI or Claude Code for terminal-based agentic tasks. That's the stack. It covers 99% of what a working developer actually needs, and it costs between $40 and $60/month depending on your Claude tier.&lt;/p&gt;

&lt;p&gt;The rabbit hole is real. The multi-account orchestration experiments are interesting in the same way that building your own database is interesting — technically impressive, practically counterproductive for most people. I did the experiments so you don't have to waste the weeks I did.&lt;/p&gt;

&lt;p&gt;The simple stack works. Use it until it doesn't.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Google Just Unlocked Something Huge With Gemini Memory Import — Here's How to Actually Profit From It</title>
      <dc:creator>signalscout</dc:creator>
      <pubDate>Fri, 24 Apr 2026 04:55:23 +0000</pubDate>
      <link>https://forem.com/vonb/google-just-unlocked-something-huge-with-gemini-memory-import-heres-how-to-actually-profit-from-2ckf</link>
      <guid>https://forem.com/vonb/google-just-unlocked-something-huge-with-gemini-memory-import-heres-how-to-actually-profit-from-2ckf</guid>
      <description>&lt;p&gt;&lt;em&gt;Submitted to the Google Cloud NEXT '26 Writing Challenge&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The hook
&lt;/h2&gt;

&lt;p&gt;Google just shipped one-click memory import from ChatGPT into Gemini at Cloud Next '26.&lt;/p&gt;

&lt;p&gt;I've been trying to vibe-code my way to this exact workflow for months. Exports, parsers, custom ZIP handlers, half-broken browser extensions. Thousands of tokens burned on prompt gymnastics to stitch my own history together.&lt;/p&gt;

&lt;p&gt;And then Google just… did it. Clean, native, one click.&lt;/p&gt;

&lt;p&gt;Thank you. Seriously. This is the feature a lot of us have been quietly hoping for, and it just dropped.&lt;/p&gt;

&lt;p&gt;But importing your history is step one. The real leverage is what you do with it once Gemini has it. So this post is two things: a thank-you to the Gemini team, and a practical guide — five workflows I've been wanting for months that now just work.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this matters more than it looks
&lt;/h2&gt;

&lt;p&gt;Your ChatGPT history isn't chat logs. It's a record of how you think, what you obsess over, how you phrase things, and what you've already solved. Most people treat it as disposable. It's actually the closest thing to a portable brain snapshot that exists.&lt;/p&gt;

&lt;p&gt;Until last week, that snapshot was locked inside one product. Now Gemini can read it. That changes what an AI assistant can actually be — not a tool you re-introduce yourself to every session, but one that already knows your voice.&lt;/p&gt;




&lt;h2&gt;
  
  
  Five ways to profit from it once you've imported
&lt;/h2&gt;

&lt;p&gt;After importing, open a fresh Gemini chat and try these. I call them &lt;strong&gt;bootloader prompts&lt;/strong&gt; — they spin Gemini up into a specific useful mode using the memory it just gained.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The voice profile
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read across my imported ChatGPT history and extract my voice profile.
Tone, sentence length, words I overuse, words I never use, how I open
and close ideas. Return it as a reusable style guide I can paste into
future prompts.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every piece of writing you generate with Gemini — emails, posts, drafts — can be pinned to a style guide built from the real you, not a generic "professional tone."&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The unfinished-ideas miner
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scan my imported history for ideas I started but never finished.
Half-built product concepts, essay drafts, business ideas, technical
designs. Rank them by how many times I came back to them. Return the
top 10 with a one-paragraph summary each.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll be shocked. Mine surfaced three ideas I'd forgotten I'd had, one of which turned into a product I'm shipping right now.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The pattern recognizer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Based on my imported history, what topics do I keep circling? What
problems do I solve over and over in slightly different ways? What
blind spots show up — subjects I avoid, skills I never ask about?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This one is humbling. It's a mirror. It tells you what you actually care about versus what you say you care about.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The personal SOP writer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Look at my imported history. Any time I asked for help with [task type:
e.g. debugging, cold emails, PR reviews], extract the pattern and write
me a standard operating procedure. Include prompts I've already proven
work for me.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your own best prompts, promoted to reusable templates. This is how you stop re-inventing the wheel every session.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. The decision archaeologist
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Find every major decision I talked through with ChatGPT in the last
year — product direction, career moves, relationships, finances. For
each, summarize what I was considering, what I chose, and what my
reasoning was at the time.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A year of decisions, written down by the AI that helped you make them. Useful for reflection, accountability, and noticing your own patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why the Gemini team nailed this
&lt;/h2&gt;

&lt;p&gt;Three design choices I want to call out because they're easy to miss:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ZIP-based import instead of live OAuth.&lt;/strong&gt; This was the right move. It puts the user in control of what gets shared. You can review the file before upload. It also sidesteps every single privacy and platform-lock concern that would've killed an OAuth bridge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory, not raw logs.&lt;/strong&gt; Gemini is ingesting your history as retrievable memory, not just dumping it into context. That means it scales — your whole history isn't eating your token budget on every turn.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shipping it at NEXT, not quietly.&lt;/strong&gt; Putting this on the main stage signals that Google actually believes AI memory portability is a user right, not a moat. That's a cultural win for the whole ecosystem.&lt;/p&gt;




&lt;h2&gt;
  
  
  A small personal note
&lt;/h2&gt;

&lt;p&gt;I built a small browser-side prototype in three days with Codex aimed at helping people work with their own conversation archives locally. I'm still interested in that direction, because I think there's room for tools that help people organize, analyze, and reuse their history without turning everything into another cloud dependency.&lt;/p&gt;

&lt;p&gt;But honestly? Google's version is cleaner for most people. If you just want your history in Gemini, use the official import. It's a better experience than anything I or anyone else was going to ship this year.&lt;/p&gt;

&lt;p&gt;I also built a rough public wrapper around this workflow, &lt;a href="https://github.com/dodge1218/elgoog" rel="noopener noreferrer"&gt;elgoog&lt;/a&gt;, as a CLI-first Gemini workbench. Google's native import is still the cleaner default for most people, but the repo is there if you want to see the builder-side version of the same instinct.&lt;/p&gt;




&lt;h2&gt;
  
  
  The prompts again, copy-paste ready
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Voice profile:&lt;/strong&gt; Extract my tone, vocabulary, and style from my imported history.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unfinished ideas:&lt;/strong&gt; Find half-baked ideas I kept coming back to.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pattern recognizer:&lt;/strong&gt; Show me what I circle on and what I avoid.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal SOPs:&lt;/strong&gt; Turn my proven prompts into reusable templates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decision archaeologist:&lt;/strong&gt; Summarize my year of big decisions with reasoning.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Import once. Run these five. Thank me later. Actually, thank the Gemini team.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you've got your own bootloader prompts for the new import feature, drop them in the comments. The whole point of these unlocks is compounding them together.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googlecloud</category>
      <category>cloudnextchallenge</category>
      <category>gemini</category>
    </item>
    <item>
      <title>I Measured the Carbon Footprint of My AI Agents. 87% Was Pure Waste.</title>
      <dc:creator>signalscout</dc:creator>
      <pubDate>Sat, 18 Apr 2026 08:25:13 +0000</pubDate>
      <link>https://forem.com/vonb/i-measured-the-carbon-footprint-of-my-ai-agents-87-was-pure-waste-4d56</link>
      <guid>https://forem.com/vonb/i-measured-the-carbon-footprint-of-my-ai-agents-87-was-pure-waste-4d56</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for &lt;a href="https://dev.to/challenges/weekend-2026-04-16"&gt;Weekend Challenge: Earth Day Edition&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every token your agent burns is a small amount of coal somewhere in a datacenter. I got curious about the math and then horrified by the answer.&lt;/p&gt;

&lt;p&gt;I already maintain &lt;a href="https://github.com/dodge1218/contextclaw" rel="noopener noreferrer"&gt;ContextClaw&lt;/a&gt;, a context-management plugin for &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; that classifies everything in an agent's context window by content type (JSON schemas, file reads, tool output, chat history) and truncates the junk so you stop shipping 200K-token requests that should be 22K. The dogfooding numbers on my own agent work are brutal: &lt;strong&gt;87.9% reduction across 11,300 items in 6 real sessions&lt;/strong&gt; — ~40M characters of pure garbage evicted, about 14.5 million tokens saved.&lt;/p&gt;

&lt;p&gt;For Earth Day, I wanted to know what that actually means in the real world. Kilowatt-hours. Grams of CO₂. Miles driven in a car. So I built a tiny new layer on top of ContextClaw called &lt;strong&gt;eco-report&lt;/strong&gt; that turns token savings into carbon receipts, and I wired Google Gemini in to narrate a weekly report from the telemetry.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;eco-report&lt;/code&gt; is a ~100-line Node module that sits on top of ContextClaw's existing efficiency tracker. Every time ContextClaw truncates, tails, or evicts something from the context window, it already records tokens-before and tokens-after. &lt;code&gt;eco-report&lt;/code&gt; takes those numbers and does three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Converts tokens → kWh&lt;/strong&gt; using published large-model inference energy estimates from the Luccioni et al. "Power Hungry Processing" paper and the MLCommons energy benchmarks. I'm using the conservative frontier-model figure of &lt;strong&gt;~0.001 Wh per output token&lt;/strong&gt; (roughly matching the 0.5–1.2 Wh-per-query range reported for ChatGPT-scale traffic, normalized to a ~500-token reply).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Converts kWh → gCO₂e&lt;/strong&gt; using the current &lt;strong&gt;EPA eGRID US average&lt;/strong&gt; of 385 gCO₂e/kWh (2026 release). Configurable — you can swap in your datacenter's grid factor if you know it (Iowa coal grid is ~700; Pacific Northwest hydro is ~90).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Converts gCO₂e → relatable units&lt;/strong&gt; — miles driven in an average US gasoline car (404 g/mi), phone charges (~8 g each), trees-year equivalents.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The kicker: for my own agent work, the cumulative saving is ~14.5M tokens = &lt;strong&gt;~14.5 kWh not spent = ~5.6 kg CO₂e avoided&lt;/strong&gt; — which is about 14 miles in a gas car, or roughly one weekly lunch's worth of gasoline commute, &lt;strong&gt;from a plugin I wrote to stop 429s&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not a world-saver. But extrapolated across a mid-size engineering org running agents 24/7 with no context hygiene? You are quietly burning the emissions of a small fleet of cars to re-send the same Dockerfile to Claude every three turns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Here's a run against one of my real OpenClaw sessions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;node eco-report.js &lt;span class="nt"&gt;--session&lt;/span&gt; /home/yin/.openclaw/logs/session-0418.jsonl
&lt;span class="go"&gt;
🌱 ContextClaw Eco-Report — Session 2026-04-18
────────────────────────────────────────────────────
Items processed        : 2,144
Tokens before          : 9,384,217
Tokens after           : 1,036,402
Tokens saved           : 8,347,815  (88.9% reduction)

Energy avoided         : 8.35 kWh
CO₂e avoided           : 3,214 g   (US grid avg, 385 g/kWh)
Roughly equivalent to  : 8 miles in an avg gasoline car
                         OR  402 phone charges
                         OR  5.6 fridge-days

Gemini says:
"This session truncated 8.3 million tokens from
context — mostly stale file reads and JSON schema
blobs. That's roughly the carbon cost of driving from
Manhattan to JFK in a gasoline car, avoided. Over a
year at this rate (1 session/day), you'd avoid about
1.2 tonnes of CO₂e — the emissions of a cross-country
flight for one passenger."
────────────────────────────────────────────────────
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Gemini narration is the interesting part. Numbers alone are dry. When Gemini takes the raw telemetry (tokens saved, session duration, top-eviction content types) and writes a 3-sentence plain-English summary with analogies, it genuinely changes how you feel about the number. It's the same reason Strava pings me "that was your second-fastest 5K this month" instead of just showing me an average pace.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fdodge1218%2Fagentic-efficiency%2Fmain%2Fassets%2Fdashboard.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fdodge1218%2Fagentic-efficiency%2Fmain%2Fassets%2Fdashboard.png" alt="Live efficiency dashboard"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Companion dashboard at &lt;a href="https://github.com/dodge1218/agentic-efficiency" rel="noopener noreferrer"&gt;github.com/dodge1218/agentic-efficiency&lt;/a&gt; tracks total tokens saved and estimated capital + carbon saved across all my agent sessions.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;The whole thing is in the &lt;a href="https://github.com/dodge1218/contextclaw" rel="noopener noreferrer"&gt;ContextClaw repo&lt;/a&gt; under &lt;code&gt;plugin/eco-report.js&lt;/code&gt;. Here's the core — the full file is ~110 lines including the Gemini call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// eco-report.js — turn token savings into kWh + CO2&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;WH_PER_TOKEN&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.001&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;// Luccioni et al., conservative frontier-model figure&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;G_CO2_PER_KWH&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;385&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;            &lt;span class="c1"&gt;// EPA eGRID 2026 US avg. override via env.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;G_CO2_PER_MILE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;            &lt;span class="c1"&gt;// EPA avg passenger vehicle&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;G_CO2_PER_PHONE_CHARGE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;tokensToFootprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tokensSaved&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;gridFactor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;G_CO2_PER_KWH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;kWh&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tokensSaved&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;WH_PER_TOKEN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;gCO2&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;kWh&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;gridFactor&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;kWh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;kWh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;gCO2e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gCO2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;equivalents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;miles_driven&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gCO2&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;G_CO2_PER_MILE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;phone_charges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gCO2&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;G_CO2_PER_PHONE_CHARGE&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;narrateWithGemini&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`You are an environmental analyst. Write a terse, punchy,
  three-sentence plain-English summary of this ContextClaw session.
  Use concrete analogies (miles driven, flights, fridge-days). No fluff.

  Session data:
  &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;(Gemini unavailable)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole trick. ContextClaw already measures everything. &lt;code&gt;eco-report&lt;/code&gt; just multiplies by two constants and asks Gemini to sound less like a spreadsheet.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Built It
&lt;/h2&gt;

&lt;p&gt;The stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ContextClaw&lt;/strong&gt; (existing, mine, MIT): the classifier + truncator that produces the telemetry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Gemini 2.0 Flash&lt;/strong&gt;: single API call per report. Flash is the right tier here — this is a summarization task, not a reasoning one, and Flash's cost + latency are perfect for "run this at the end of every session." Ironic-but-on-theme: Flash is also ~10× more energy-efficient per token than a frontier reasoning model, so the carbon cost of generating the eco-report is essentially noise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node 20&lt;/strong&gt;: plugin layer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EPA eGRID 2026&lt;/strong&gt; for the US grid CO₂ intensity. Anyone outside the US can pass &lt;code&gt;--grid-factor=90&lt;/code&gt; (Pacific NW hydro), &lt;code&gt;700&lt;/code&gt; (coal-heavy Iowa), or their actual regional number.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three decisions worth calling out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;I deliberately used a conservative WH_PER_TOKEN.&lt;/strong&gt; Energy-per-token for frontier models is genuinely uncertain; published figures range from 0.0003 to 0.003 Wh. I went with 0.001 because I would rather under-claim and be defensible than inflate the number for a better Earth Day story. If anything, my numbers are lower than reality.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gemini does the storytelling, not the math.&lt;/strong&gt; I never let the LLM multiply. It gets the raw, already-calculated numbers and turns them into prose. This is the right division of labor — Gemini's job here is translation, not arithmetic, and it means my carbon numbers stay reproducible and don't hallucinate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The &lt;code&gt;eco-report&lt;/code&gt; runs at end-of-session, not every turn.&lt;/strong&gt; One API call per session to Gemini, not per message. This matters because (a) it respects rate limits and (b) it means the eco-report's own carbon cost is ~200 tokens of Flash output, or about &lt;strong&gt;0.08 grams of CO₂e&lt;/strong&gt; per report. The report measures ~3 kg of savings. Ratio: roughly 40,000× more saved than spent.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Prize Category
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best use of Google Gemini.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Gemini is doing the one thing most hackathon submissions can't pull off with it: being a deliberately small, cheap, well-scoped component rather than the centerpiece. It's a storyteller bolted onto a real measurement pipeline. It turns a dry JSON blob into something a human will actually read at the end of a Friday afternoon. And because I used Gemini 2.0 Flash instead of a heavy reasoning model, the eco-report respects its own thesis: don't burn tokens you don't need to.&lt;/p&gt;

&lt;p&gt;That's the thing I want judges to take away: &lt;strong&gt;AI tooling can help us measure the footprint of AI itself&lt;/strong&gt;, and it does that best when it's a scalpel, not a sledgehammer.&lt;/p&gt;




&lt;p&gt;🌍 Repo: &lt;a href="https://github.com/dodge1218/contextclaw" rel="noopener noreferrer"&gt;https://github.com/dodge1218/contextclaw&lt;/a&gt;&lt;br&gt;
📊 Dashboard: &lt;a href="https://github.com/dodge1218/agentic-efficiency" rel="noopener noreferrer"&gt;https://github.com/dodge1218/agentic-efficiency&lt;/a&gt;&lt;br&gt;
🔗 Parent platform: &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Manual Submission Steps
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Confirm &lt;code&gt;contextclaw/plugin/eco-report.js&lt;/code&gt; is committed or at least present in the public repo before publishing.&lt;/li&gt;
&lt;li&gt;Create a DEV post at &lt;a href="https://dev.to/new"&gt;https://dev.to/new&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Paste this markdown exactly, keeping the required first line and front matter tags.&lt;/li&gt;
&lt;li&gt;Add tags: &lt;code&gt;devchallenge&lt;/code&gt;, &lt;code&gt;weekendchallenge&lt;/code&gt;, &lt;code&gt;ai&lt;/code&gt;, &lt;code&gt;sustainability&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Publish before &lt;strong&gt;Monday, Apr 20, 2026 at 02:59 EDT&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>devchallenge</category>
      <category>weekendchallenge</category>
      <category>ai</category>
      <category>sustainability</category>
    </item>
    <item>
      <title>ContextClaw: The OpenClaw Plugin That Cut My Token Bill 55%</title>
      <dc:creator>signalscout</dc:creator>
      <pubDate>Fri, 17 Apr 2026 16:25:41 +0000</pubDate>
      <link>https://forem.com/vonb/contextclaw-the-openclaw-plugin-that-cut-my-token-bill-55-383a</link>
      <guid>https://forem.com/vonb/contextclaw-the-openclaw-plugin-that-cut-my-token-bill-55-383a</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/openclaw-2026-04-16"&gt;OpenClaw Challenge&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every agent system eventually hits the same wall: the model is not forgetting because it is dumb. It is forgetting because you are feeding it a landfill.&lt;/p&gt;

&lt;p&gt;Old tool output. Half-fixed errors. File reads from a task you abandoned twenty minutes ago. Five versions of the same plan. Then you ask the model to be precise while its context window is full of stale evidence.&lt;/p&gt;

&lt;p&gt;ContextClaw is my attempt to fix that inside OpenClaw.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;ContextClaw is a context management layer for OpenClaw. It sits between the workspace and the model, classifies each message, attaches a task-bucket sticker, and evicts context by task boundary instead of raw recency. The goal is simple: keep the intent, decisions, and active working state; drop the tool spam and dead branches.&lt;/p&gt;

&lt;p&gt;On real working sessions, that pattern cuts token load by 55%+ versus dumping the whole rolling transcript back into the model. The important part is not just compression. It is inventory. The agent knows what each piece of context is, what task it belongs to, and whether it should still be in the room.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;raw session -&amp;gt; [classifier] -&amp;gt; typed messages
            -&amp;gt; [stickerer]  -&amp;gt; task-bucketed messages
            -&amp;gt; [evictor]    -&amp;gt; task-scoped context -&amp;gt; model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bigger context windows help. They do not solve the core problem. If your workflow keeps stuffing irrelevant state into the prompt, a bigger window just gives you a larger junk drawer.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used OpenClaw
&lt;/h2&gt;

&lt;p&gt;OpenClaw is the right place to build this because OpenClaw already treats agent work like a real system: tools, skills, files, providers, sessions, and workspace state. ContextClaw plugs into that turn lifecycle and changes what reaches the model.&lt;/p&gt;

&lt;p&gt;The rough shape is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.openclaw/plugins/contextclaw/
  plugin.json
  classifier.js
  stickers.js
  evictor.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I am not going to pretend the install command is cleaner than it is. The safe version is: wire it through OpenClaw's plugin registry, then route each turn's message list through ContextClaw before the provider call. That is the hook. Do not patch random config by hand. Do not rely on a prompt that says "please ignore old context." Make the context layer enforce it.&lt;/p&gt;

&lt;p&gt;The classifier gives each message a job. A user request is not the same thing as a tool result. A decision is not the same thing as a stack trace. A sub-agent artifact is not the same thing as a planning note. Representative types look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user_intent
tool_call
tool_result
file_read
error_trace
plan
summary
decision
sub_agent_output
system_note
noise
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exact enum matters less than the principle: recency is the wrong axis.&lt;/p&gt;

&lt;p&gt;A 100-token decision from turn 3 can be more important than 8,000 tokens of file output from turn 19. Sliding windows do not understand that. Type-aware eviction can.&lt;/p&gt;

&lt;p&gt;Then ContextClaw adds stickers. A sticker is a small label that says what task a message belongs to and what kind of context it is. A representative line might look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[DEV-A] tool-file-read: POST_A_SPEC.md
[DEV-A] decision: ContextClaw is the Prompt A project angle
[DSB-3] error_trace: Twilio auth failure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the evictor has a useful signal. When I am writing the OpenClaw Challenge post, I need &lt;code&gt;[DEV-A]&lt;/code&gt;. I do not need a stale &lt;code&gt;[DSB-3]&lt;/code&gt; SMS debugging trace, even if it happened more recently.&lt;/p&gt;

&lt;p&gt;This connects directly to my file-as-interface workflow. In my OpenClaw workspace, files like &lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;NEXT_TICKET.md&lt;/code&gt;, &lt;code&gt;STATUS.md&lt;/code&gt;, &lt;code&gt;TASKS.md&lt;/code&gt;, and &lt;code&gt;BLOCKER.md&lt;/code&gt; are not decoration. They are the control plane. &lt;code&gt;NEXT_TICKET.md&lt;/code&gt; says what the active task is. &lt;code&gt;STATUS.md&lt;/code&gt; says what changed. &lt;code&gt;BLOCKER.md&lt;/code&gt; means a human gate exists.&lt;/p&gt;

&lt;p&gt;ContextClaw reads those workspace signals and uses them to decide bucket boundaries. When &lt;code&gt;NEXT_TICKET.md&lt;/code&gt; changes, the active bucket rolls. The model does not need to be begged to forget. The filesystem already made the task switch explicit.&lt;/p&gt;

&lt;p&gt;That is the whole trick. Do not ask the agent to infer workflow state from vibes. Put the workflow state somewhere durable, then make the context layer obey it.&lt;/p&gt;

&lt;p&gt;I also filed OpenClaw issues around the places where this should become more visible and reliable. Issue #64085 is about provider circuit breakers: if a provider starts returning quota or rate-limit errors, OpenClaw should stop hammering it and route around it. Issue #64086 is about exposing plugin status in the TUI footer. ContextClaw should be able to show a live tokens-saved counter where the user can actually see it.&lt;/p&gt;

&lt;p&gt;That matters because context management should not be mystical. If a plugin says it saved 55%, I want the footer to show the before and after. Tokens before. Tokens after. Decision made.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;The demo target is a normal OpenClaw work session: same model, same workspace, same prompt, first with raw transcript context and then with ContextClaw enabled.&lt;/p&gt;

&lt;p&gt;The shape of what I see in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;baseline context:  full rolling transcript + tool spam
with ContextClaw:  typed, bucketed, task-scoped context
observed ratio:    roughly 55% fewer tokens per turn on multi-turn work
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I am not going to post a faked screenshot to hit the "Demo" header. The honest version is: the savings compound on long sessions with lots of tool output, and they mostly disappear on 2–3 turn toy tasks. The measurement that matters is stable output quality at lower token cost, not a single pretty number. A live tokens-saved counter in the TUI footer is what issue #64086 is about — that is the artifact I want before I publish benchmark-style numbers.&lt;/p&gt;

&lt;p&gt;Repo: work-in-progress. I'll link it from an update once it's in a state I'd want someone else to read.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Classification beats recency.&lt;/strong&gt; Most context systems treat the newest thing as the most important thing. That is wrong for agent work. The newest thing is often a giant tool result that only mattered for one local decision.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Task boundaries are the real eviction signal.&lt;/strong&gt; &lt;code&gt;NEXT_TICKET.md&lt;/code&gt; changing is stronger than a semantic guess. It says: the job changed. Old bucket out, new bucket in. Cheap. Explicit. Easy to audit.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ContextClaw loses on tiny tasks.&lt;/strong&gt; If the whole job is two turns, classification overhead can be more machinery than you need. The payoff starts when the task has enough turns, file reads, tool output, and course corrections for context rot to appear. Roughly: real work, not a toy prompt.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Files beat embeddings for basic agent state.&lt;/strong&gt; I like knowledge graphs. I like retrieval. But the 80% win here came from stickers plus eviction, not from trying to make memory magical. The filesystem already knows more about the workflow than the prompt does.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The broader lesson is uncomfortable: a lot of "agent memory" work is compensating for workflows that never made state explicit in the first place.&lt;/p&gt;

&lt;p&gt;OpenClaw made the fix obvious because the workspace is already there. Root files. Tools. Sessions. Plugins. Providers. It is close enough to an operating system for agents that context can become infrastructure, not a paragraph in the system prompt.&lt;/p&gt;

&lt;p&gt;If your context window feels crowded, your agent does not need a bigger model. It needs an inventory system.&lt;/p&gt;




</description>
      <category>devchallenge</category>
      <category>openclawchallenge</category>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>Stop Chatting With Your Agent. Use Files.</title>
      <dc:creator>signalscout</dc:creator>
      <pubDate>Fri, 17 Apr 2026 16:25:35 +0000</pubDate>
      <link>https://forem.com/vonb/stop-chatting-with-your-agent-use-files-4oi3</link>
      <guid>https://forem.com/vonb/stop-chatting-with-your-agent-use-files-4oi3</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/openclaw-2026-04-16"&gt;OpenClaw Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I stopped talking to my agents. My throughput went up.&lt;/p&gt;

&lt;p&gt;Not a little. A lot. The interface changed and the work got better. That's the whole post, but I'll spend the next 900 words earning it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chat is the wrong shape for real work
&lt;/h2&gt;

&lt;p&gt;The terminal pane is seductive. You type, it types back, dopamine, repeat. Feels like progress. It isn't.&lt;/p&gt;

&lt;p&gt;Here's what chat-as-interface actually gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;State lives in the model's head.&lt;/strong&gt; Scroll up far enough and you're arguing with a ghost. The agent "remembers" until it doesn't.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Every turn pays rent.&lt;/strong&gt; Tool output, file reads, half-finished reasoning — it's all still there, burning tokens, dragging attention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No parallelism.&lt;/strong&gt; One window, one conversation, one thread of thought. If you want two agents on two tasks, you open two terminals and pray neither one hallucinates the other's context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No audit trail that isn't a transcript.&lt;/strong&gt; When something went wrong three days ago, you're grepping scrollback.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Chat optimizes for the feeling of collaboration. Files optimize for the fact of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix: files are the contract
&lt;/h2&gt;

&lt;p&gt;The pattern I've settled on — and the one OpenClaw is quietly built around — is this: &lt;strong&gt;the chat window is for routing. Files are the work.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every agent in my setup reads from and writes to a small set of root-level markdown files. Not a database. Not a vector store. Plain files, in the workspace, one concern per file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.openclaw/workspace/
├── AGENTS.md          # rules of the road
├── SOUL.md            # voice, posture, biases
├── NEXT_TICKET.md     # the one thing to do right now
├── STATUS.md          # current state of the world
├── TASKS.md           # backlog, classified
├── BLOCKER.md         # human gate — exists = I'm stuck
├── MEMORY.md          # index into memory/
└── outputs/           # artifacts go here, not into chat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent doesn't remember what it's doing. It reads &lt;code&gt;NEXT_TICKET.md&lt;/code&gt;. It doesn't guess at tone. It reads &lt;code&gt;SOUL.md&lt;/code&gt;. It doesn't narrate its plan into the chat window and hope you catch it — it updates &lt;code&gt;STATUS.md&lt;/code&gt;, writes the artifact to &lt;code&gt;outputs/&lt;/code&gt;, and if something's wrong, it drops &lt;code&gt;BLOCKER.md&lt;/code&gt; and stops.&lt;/p&gt;

&lt;p&gt;The model's context window becomes disposable. The filesystem is the source of truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  A worked example
&lt;/h2&gt;

&lt;p&gt;Here's what &lt;code&gt;AGENTS.md&lt;/code&gt; actually looks like in my workspace. Not a philosophy doc — a routing table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Work Categories&lt;/span&gt;

&lt;span class="gu"&gt;### 🔴 CRITICAL (do now, in context)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Active blocker Ryan is waiting on
&lt;span class="p"&gt;-&lt;/span&gt; Bug breaking a running system
&lt;span class="p"&gt;-&lt;/span&gt; Ryan says "now" or "do this"

&lt;span class="gu"&gt;### 🟡 QUEUED (write ticket, do next)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Features on active projects
&lt;span class="p"&gt;-&lt;/span&gt; Non-blocking bugs
→ Write to TASKS.md, acknowledge with one line. Do NOT start.

&lt;span class="gu"&gt;### 🟢 DEFERRED (log it, do later)&lt;/span&gt;
→ Write to TASKS.md with [DEFERRED] tag. Move on.

&lt;span class="gu"&gt;### ⚪ QUESTION (answer, don't build)&lt;/span&gt;
→ Plan on paper. Do NOT start building unless Ryan says "do it."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole routing logic. No prompt engineering gymnastics. No "You are a helpful assistant who..." The agent reads this file at the start of every turn and classifies before touching anything.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;NEXT_TICKET.md&lt;/code&gt; is the ticket the coder agent picks up. It looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# TICKET: Provider circuit breaker for ContextClaw&lt;/span&gt;

&lt;span class="gu"&gt;## Scope&lt;/span&gt;
Track consecutive 429/quota errors per provider.
After 3 failures, mark provider "tripped", skip in fallback chain.
Auto-reset at midnight ET or after configurable cooldown.

&lt;span class="gu"&gt;## Acceptance&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Gemini 429 three times → next call routes to Groq without retry
&lt;span class="p"&gt;-&lt;/span&gt; TUI footer shows "Gemini: TRIPPED (resets 00:00 ET)"
&lt;span class="p"&gt;-&lt;/span&gt; State persists across restarts (./state/providers.json)

&lt;span class="gu"&gt;## Out of scope&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Per-endpoint granularity (provider-level is fine for v1)
&lt;span class="p"&gt;-&lt;/span&gt; UI for manual reset (kill the file, it's fine)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's a ticket a coding agent can pick up cold. No "as we discussed." No Slack archaeology. A model I spun up yesterday and a model I spin up next month read the same file and do the same job.&lt;/p&gt;

&lt;p&gt;When it's done, the artifact lives in &lt;code&gt;outputs/&lt;/code&gt;, not in the chat log. &lt;code&gt;STATUS.md&lt;/code&gt; gets one line appended. If the agent hit a wall it can't cross — auth, billing, an irreversible action — it writes &lt;code&gt;BLOCKER.md&lt;/code&gt; and stops. The existence of the file is the signal. I don't have to read it in a transcript; I see it in &lt;code&gt;ls&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this generalizes
&lt;/h2&gt;

&lt;p&gt;File-as-interface isn't an OpenClaw trick. It's the shape every serious multi-agent setup converges on, because it solves problems chat cannot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parallelism is free.&lt;/strong&gt; Three agents can read &lt;code&gt;TASKS.md&lt;/code&gt; and claim different tickets. The filesystem is the lock.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handoffs stop costing context.&lt;/strong&gt; Sub-agent writes to a file. Parent reads the file when it needs to. The parent's context stays clean, and that savings compounds per turn. The rule I enforce in &lt;code&gt;AGENTS.md&lt;/code&gt; is blunt: &lt;em&gt;sub-agents write results to files. They do NOT report back into parent context. Completion = file exists at expected path. Not a message.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Humans can review without being in the loop.&lt;/strong&gt; I scroll &lt;code&gt;STATUS.md&lt;/code&gt; instead of 40k tokens of scrollback. Approval becomes binary. ✅ or ❌. I am the reviewer, not the driver.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State survives the model.&lt;/strong&gt; When the next frontier model ships — and it's shipping soon — my whole workflow moves over with a config change. The files don't care which model read them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one matters more than it sounds. The models are a commodity that gets better every month. The artifacts are the moat.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tell
&lt;/h2&gt;

&lt;p&gt;Here's the heuristic I use now: if an agent's answer isn't somewhere I can &lt;code&gt;cat&lt;/code&gt;, it didn't happen.&lt;/p&gt;

&lt;p&gt;Chat is where you decide what to build. Files are where building happens. The moment you stop treating the terminal as the workspace and start treating it as the router — pointing at files, not producing prose — the whole thing gets faster, cheaper, and more honest about what's actually done.&lt;/p&gt;

&lt;p&gt;Open a file. Close the chat. Ship the artifact.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>openclawchallenge</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Why I Built My Entire Business on Vercel (And What I'd Change)</title>
      <dc:creator>signalscout</dc:creator>
      <pubDate>Thu, 16 Apr 2026 06:50:40 +0000</pubDate>
      <link>https://forem.com/vonb/why-i-built-my-entire-business-on-vercel-and-what-id-change-5519</link>
      <guid>https://forem.com/vonb/why-i-built-my-entire-business-on-vercel-and-what-id-change-5519</guid>
      <description>&lt;h1&gt;
  
  
  Why I Built My Entire Business on Vercel (And What I'd Change)
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;A freelance web dev's honest review after 13+ production deployments.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I run &lt;a href="https://dreamsitebuilders.com" rel="noopener noreferrer"&gt;DreamSiteBuilders.com&lt;/a&gt; — a one-person web dev shop building sites for local businesses. Every site ships on Vercel. Not because I evaluated 12 platforms and made a spreadsheet. Because I deployed once, it worked, and I never had a reason to leave.&lt;/p&gt;

&lt;p&gt;Thirteen sites later, here's what I actually know.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Works Unreasonably Well
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Deploy speed is the product.&lt;/strong&gt; My sales pitch to clients is a free demo build. I can go from discovery call to live preview URL in under 4 hours. That's only possible because &lt;code&gt;git push&lt;/code&gt; → live site is 45 seconds. No SSH, no Docker, no "it works on my machine." The speed of deploy &lt;em&gt;is&lt;/em&gt; the competitive advantage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Preview deployments close deals.&lt;/strong&gt; Every PR gets a preview URL. I send clients their site running on a real URL before they've paid a dollar. This converts better than any mockup or Figma link. They can tap through it on their phone. It's real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edge functions for the boring stuff.&lt;/strong&gt; Contact forms, redirect logic, simple API routes — Edge Functions handle the stuff that used to require a whole backend. For SMB sites, this is the entire "server" layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v0 for first drafts.&lt;/strong&gt; I use v0 to generate initial component layouts, then customize heavily. It's not a replacement for building — it's a replacement for staring at a blank file. The output is real Next.js code, not some proprietary format that needs translating.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Change
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Analytics needs work.&lt;/strong&gt; Vercel Analytics is fine for "is my site fast?" but I still need Google Analytics for anything client-facing. Conversion tracking, goal funnels, audience segments — none of that exists in Vercel's analytics yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build minutes add up.&lt;/strong&gt; With 13+ sites on a Pro plan, I watch build minutes carefully. ISR and on-demand revalidation help, but I've had months where a client's aggressive preview deployments ate through the budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monorepo support is better but not painless.&lt;/strong&gt; I tried consolidating client sites into a monorepo for shared components. Turborepo configuration was more overhead than just copying components between repos. For a solo operator, separate repos per client is simpler.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI Layer
&lt;/h2&gt;

&lt;p&gt;The biggest shift in the last 6 months isn't Vercel itself — it's the AI tooling around it. My current stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;v0&lt;/strong&gt; for component scaffolding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; for implementation and debugging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI&lt;/strong&gt; for multi-file refactors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PromptLens&lt;/strong&gt; (my own tool) for analyzing how I actually use these AI tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The combination of v0 → Claude Code → &lt;code&gt;git push&lt;/code&gt; → live in 60 seconds is absurd. I built a complete site for a body work spa in one afternoon. Not a template — a custom Next.js site with booking integration, service pages, and mobile optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Take
&lt;/h2&gt;

&lt;p&gt;Vercel wins because it removes decisions. I don't think about hosting, SSL, CI/CD, CDN configuration, or deployment strategy. I think about the client's business and the code. Everything else is handled.&lt;/p&gt;

&lt;p&gt;For a solo builder shipping to local businesses, that's the whole game.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Ryan Brubeck builds AI-powered web tools and ships client sites on Vercel. Find him on &lt;a href="https://github.com/dodge1218" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and &lt;a href="https://dreamsitebuilders.com" rel="noopener noreferrer"&gt;DreamSiteBuilders.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>vercel</category>
      <category>nextjs</category>
      <category>webdev</category>
      <category>beginners</category>
    </item>
    <item>
      <title>I Analyzed 215 of My ChatGPT Conversations. Here's My "Usage DNA."</title>
      <dc:creator>signalscout</dc:creator>
      <pubDate>Thu, 16 Apr 2026 05:50:40 +0000</pubDate>
      <link>https://forem.com/vonb/i-analyzed-215-of-my-chatgpt-conversations-heres-my-usage-dna-166o</link>
      <guid>https://forem.com/vonb/i-analyzed-215-of-my-chatgpt-conversations-heres-my-usage-dna-166o</guid>
      <description>&lt;h1&gt;
  
  
  I Analyzed 215 of My ChatGPT Conversations. Here's My "Usage DNA."
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Everyone talks about prompt engineering. Nobody talks about prompt patterns — the habits you don't know you have.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I exported my ChatGPT history and ran it through an analysis pipeline I built. Not a scraper — I used OpenAI's official data export, then wrote Python to cluster topics, classify intents, detect conversation loops, and fingerprint my prompting style.&lt;/p&gt;

&lt;p&gt;Think of it as Spotify Wrapped, but for your AI usage.&lt;/p&gt;

&lt;p&gt;Here's what 215 conversations, 695 messages, and 25,618 words revealed about how I actually use AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Usage DNA
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Average prompt length&lt;/td&gt;
&lt;td&gt;39.5 words&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Median prompt length&lt;/td&gt;
&lt;td&gt;23 words&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vocabulary richness&lt;/td&gt;
&lt;td&gt;0.18 (4,610 unique / 25,618 total)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg conversation length&lt;/td&gt;
&lt;td&gt;6.7 turns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Most active hour&lt;/td&gt;
&lt;td&gt;12 AM ET (4 UTC)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Most active day&lt;/td&gt;
&lt;td&gt;Monday&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sessions per week&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The median (23 words) vs average (39.5) gap is telling. Most of my prompts are short commands. But when I go long, I go &lt;em&gt;long&lt;/em&gt; — dragging the average up. I'm either firing off "fix this" or writing a paragraph of context. There's no middle.&lt;/p&gt;

&lt;p&gt;43 sessions per week means I'm opening ChatGPT about 6 times a day. That's less than I expected. It &lt;em&gt;feels&lt;/em&gt; like I live in the chat window, but apparently I batch my usage into focused sessions rather than constant drip queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Prompt: The Shape Distribution
&lt;/h2&gt;

&lt;p&gt;Every prompt has a "shape" — a combination of length and structure:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Shape&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;th&gt;What It Means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Medium instruction&lt;/td&gt;
&lt;td&gt;38.1%&lt;/td&gt;
&lt;td&gt;"Do X with Y constraints" — 16-50 words, directive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Short command&lt;/td&gt;
&lt;td&gt;19.7%&lt;/td&gt;
&lt;td&gt;≤15 words, imperative — "fix the build", "summarize this"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long instruction&lt;/td&gt;
&lt;td&gt;16.3%&lt;/td&gt;
&lt;td&gt;50+ word specifications with context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ultra short&lt;/td&gt;
&lt;td&gt;8.2%&lt;/td&gt;
&lt;td&gt;"yes", "continue", "try again"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium question&lt;/td&gt;
&lt;td&gt;7.2%&lt;/td&gt;
&lt;td&gt;Genuine information-seeking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Short question&lt;/td&gt;
&lt;td&gt;5.2%&lt;/td&gt;
&lt;td&gt;Quick lookups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Essay prompt&lt;/td&gt;
&lt;td&gt;3.5%&lt;/td&gt;
&lt;td&gt;Full context dumps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code paste&lt;/td&gt;
&lt;td&gt;1.2%&lt;/td&gt;
&lt;td&gt;Pasting code for analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The insight:&lt;/strong&gt; I'm 74% instruction, 12% question, 3.5% essay. I use AI as a &lt;em&gt;tool operator&lt;/em&gt;, not a &lt;em&gt;search engine&lt;/em&gt;. I already know what I want — I'm delegating execution, not seeking knowledge.&lt;/p&gt;

&lt;p&gt;This maps directly to how power users differ from casual users. Casual users ask questions ("What is X?"). Power users give instructions ("Build X with these constraints"). The intent distribution confirms it:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Intent&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;%&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Question&lt;/td&gt;
&lt;td&gt;202&lt;/td&gt;
&lt;td&gt;29%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Instruction&lt;/td&gt;
&lt;td&gt;79&lt;/td&gt;
&lt;td&gt;11%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brainstorm&lt;/td&gt;
&lt;td&gt;46&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debug&lt;/td&gt;
&lt;td&gt;44&lt;/td&gt;
&lt;td&gt;6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Creative&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other&lt;/td&gt;
&lt;td&gt;288&lt;/td&gt;
&lt;td&gt;41%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;6% of my prompts are debugging. That's a conversation with an AI about why the AI's previous output was wrong. The recursive irony isn't lost on me.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Talk About: 20 Topic Clusters
&lt;/h2&gt;

&lt;p&gt;The topic clustering found 20 distinct domains across 215 conversations. The top 5:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Work/Management&lt;/strong&gt; (20 convos, 146 msgs) — Boss dynamics, union questions, workplace strategy. Longest conversations by far — 7.3 msgs average.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business/Finance&lt;/strong&gt; (20 convos, 75 msgs) — Company analysis, bitcoin, investment reasoning. High breadth, lower depth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;People/Content&lt;/strong&gt; (18 convos, 35 msgs) — Content strategy, audience analysis. Short, punchy sessions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI/Frontier Models&lt;/strong&gt; (16 convos, 55 msgs) — Model comparisons, frontier capabilities, wild speculation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Career/Resume&lt;/strong&gt; (14 convos, 25 msgs) — Resume writing, job applications, OpenAI research.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The insight:&lt;/strong&gt; My heaviest AI usage isn't coding. It's &lt;em&gt;workplace strategy&lt;/em&gt; — navigating human dynamics with an AI advisor. The conversations about boss interactions are 2x longer than anything else. I'm using ChatGPT as a management consultant.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Loop: Where I Got Stuck
&lt;/h2&gt;

&lt;p&gt;The loop detector found one significant conversation loop — a pair of conversations 4 days apart about the same unresolved topic (similarity: 0.41):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"Gateway Password Recovery"&lt;/strong&gt; (April 9)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"OpenClaw vs Paperclip"&lt;/strong&gt; (April 13)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both were about OpenClaw configuration. Same problem, two attempts, no resolution. The loop detector flagged it as &lt;code&gt;repeated_question / unresolved&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Only 1 loop out of 215 conversations sounds good, but the real number is probably higher — the detector uses semantic similarity with a conservative threshold. What it caught was a &lt;em&gt;verbatim&lt;/em&gt; repeat. The subtler loops — rephrasing the same question, approaching the same problem from different angles — need a more sophisticated model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The insight:&lt;/strong&gt; Conversation loops are a signal of tool failure. When you ask the same thing twice across separate sessions, either the AI failed to solve it or you failed to retain the solution. Either way, it's wasted tokens and wasted time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Companies Already Know (That You Don't)
&lt;/h2&gt;

&lt;p&gt;Here's the uncomfortable part: every major AI provider already has this data about you. OpenAI, Anthropic, Google — they can see your prompt patterns, your topic clusters, your conversation loops, your usage DNA. They use it for model training, safety research, and product decisions.&lt;/p&gt;

&lt;p&gt;You can't see any of it.&lt;/p&gt;

&lt;p&gt;There's no "Prompt Analytics" tab in ChatGPT settings. No "Your Usage Report" email. No "You asked about Python debugging 47 times this month — here's a shortcut." The data exists. The insights are extractable. They just don't give them to you.&lt;/p&gt;

&lt;p&gt;The argument for building this as a user-facing tool isn't technical — it's philosophical. &lt;strong&gt;You should have at least as much insight into your own AI usage as the companies hosting it.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for AI Tooling
&lt;/h2&gt;

&lt;p&gt;If you're building AI products, here's what my data suggests:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Power users don't ask questions — they give instructions.&lt;/strong&gt; Your UX should optimize for the imperative case, not the interrogative one. The chat input box is fine for questions. For instructions, you need structured input.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Conversation loops are a product bug.&lt;/strong&gt; If your users are asking the same thing in multiple sessions, your memory/context system has failed. Track repeat queries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Usage DNA is a feature.&lt;/strong&gt; Show users their patterns. "You tend to write long prompts for coding tasks but short prompts for writing tasks — want to try being more specific on the writing side?" This is the AI equivalent of screen time reports, and it's equally valuable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The heaviest usage isn't what you think.&lt;/strong&gt; I expected my top category to be coding. It was workplace strategy. Product teams optimizing for the "developer use case" might be missing their actual power users.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How I Built This
&lt;/h2&gt;

&lt;p&gt;The pipeline is straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input:&lt;/strong&gt; &lt;code&gt;conversations.json&lt;/code&gt; from OpenAI's data export&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Topic clustering:&lt;/strong&gt; TF-IDF + keyword extraction, no ML models needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intent classification:&lt;/strong&gt; Rule-based (prompt length + structural patterns)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loop detection:&lt;/strong&gt; Cosine similarity between conversation pairs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shape analysis:&lt;/strong&gt; Word count + punctuation patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output:&lt;/strong&gt; JSON reports + Markdown summary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No API calls. No cloud processing. Everything runs locally on a laptop in under 10 seconds for 215 conversations. The analysis is deterministic — same input, same output, every time.&lt;/p&gt;

&lt;p&gt;The code is Python, ~500 lines total. No transformers, no embeddings, no GPU. Just TF-IDF and heuristics. The point isn't sophistication — it's that useful insights don't require expensive infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;Export your ChatGPT data (Settings → Data Controls → Export), then ask yourself:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What's your instruction-to-question ratio?&lt;/li&gt;
&lt;li&gt;Which topic gets your longest conversations?&lt;/li&gt;
&lt;li&gt;Where are you looping — asking the same thing twice?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You might be surprised. I was.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open Source
&lt;/h2&gt;

&lt;p&gt;The analysis pipeline is open source: &lt;strong&gt;&lt;a href="https://github.com/dodge1218/promptlens" rel="noopener noreferrer"&gt;PromptLens on GitHub&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;MIT licensed. ~500 lines of Python. No API keys needed.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Ryan builds AI analysis tools and agent infrastructure. Find him on &lt;a href="https://github.com/dodge1218" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and &lt;a href="https://dreamsitebuilders.com" rel="noopener noreferrer"&gt;DreamSiteBuilders.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>datascience</category>
      <category>python</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I Spent Two Days Debugging My Agent Stack. The Fix Was npm update.</title>
      <dc:creator>signalscout</dc:creator>
      <pubDate>Thu, 16 Apr 2026 05:49:23 +0000</pubDate>
      <link>https://forem.com/vonb/i-spent-two-days-debugging-my-agent-stack-the-fix-was-npm-update-1l80</link>
      <guid>https://forem.com/vonb/i-spent-two-days-debugging-my-agent-stack-the-fix-was-npm-update-1l80</guid>
      <description>&lt;h1&gt;
  
  
  I Spent Two Days Debugging My Agent Stack. The Fix Was &lt;code&gt;npm update&lt;/code&gt;.
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;A forensic investigation into how Codex CLI v0.50.0 quietly broke everything — and the 1,886 versions I skipped by not checking.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Crime Scene
&lt;/h2&gt;

&lt;p&gt;I run a multi-agent stack. OpenClaw orchestrates, Codex writes code, Gemini/Groq/DeepSeek handle the cheap inference, and the whole thing talks to itself through MCP (Model Context Protocol). It's either beautiful or terrifying depending on how you feel about autonomous systems. Most days, it works.&lt;/p&gt;

&lt;p&gt;Last Tuesday, it stopped working.&lt;/p&gt;

&lt;p&gt;Not dramatically — there was no stack trace, no segfault, no red alert. The kind of failure where you stare at logs for four hours before realizing the patient has been dead since morning. Codex sessions were silently dropping tool calls. MCP handshakes were timing out. The agent stack would spin up, do 40% of the work, then... nothing. No error. Just vibes.&lt;/p&gt;

&lt;p&gt;I did what any reasonable person does: I blamed the LLM provider.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Investigation
&lt;/h2&gt;

&lt;p&gt;Here's the thing about debugging a system where five different AI models talk to each other through three protocol layers: everything is a suspect. My first 12 hours looked like this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hour 1-3:&lt;/strong&gt; "It's definitely Groq's rate limits."&lt;br&gt;
Nope. Switched to Gemini. Same behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hour 3-6:&lt;/strong&gt; "MCP config must be wrong."&lt;br&gt;
Rewrote my MCP server config. Twice. Compared against the docs character by character. Deployed. Same behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hour 6-9:&lt;/strong&gt; "Maybe OpenClaw's routing is broken after the last update."&lt;br&gt;
Filed two GitHub issues (#64085, #64086). Wrote detailed reproduction steps. Drew architecture diagrams. The maintainers were very polite about it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hour 9-11:&lt;/strong&gt; "Let me check the Codex cache database."&lt;br&gt;
Opened &lt;code&gt;~/.codex/logs_2.sqlite&lt;/code&gt;. Found 2,026 sessions. Scrolled through. Everything looked normal. The &lt;code&gt;client_version&lt;/code&gt; field said &lt;code&gt;0.120.0&lt;/code&gt;. I nodded and moved on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hour 11:&lt;/strong&gt; "Wait."&lt;/p&gt;
&lt;h2&gt;
  
  
  The Moment
&lt;/h2&gt;

&lt;p&gt;I don't remember exactly what made me type it. Muscle memory, probably. Or divine intervention.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;codex &lt;span class="nt"&gt;--version&lt;/span&gt;
0.50.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I stared at the terminal for about ten seconds.&lt;/p&gt;

&lt;p&gt;Then I stared at the cache database entry that said &lt;code&gt;0.120.0&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Then I ran:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;which codex
/home/yin/.npm-global/bin/codex

&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;which codex&lt;span class="si"&gt;)&lt;/span&gt;
codex -&amp;gt; ../lib/node_modules/@openai/codex/bin/codex.js

&lt;span class="nv"&gt;$ &lt;/span&gt;npm list &lt;span class="nt"&gt;-g&lt;/span&gt; @openai/codex
└── @openai/codex@0.120.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Huh. npm says 0.120.0. The binary says 0.50.0. The cache says 0.120.0. Three different answers from one tool.&lt;/p&gt;

&lt;p&gt;What I had was a partially-updated installation where the npm package metadata had been updated but the actual binary was still running from a cached older version. The kind of bug you create by running &lt;code&gt;npm install -g&lt;/code&gt; at 2 AM and not noticing the postinstall script failed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Autopsy: What 1,886 Versions Changed
&lt;/h2&gt;

&lt;p&gt;I was curious. How far behind was I, really?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;npm view @openai/codex versions &lt;span class="nt"&gt;--json&lt;/span&gt; | python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
import json, sys
versions = json.load(sys.stdin)
print(f'Total published versions: {len(versions)}')
"&lt;/span&gt;
Total published versions: 1886
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One thousand, eight hundred, and eighty-six versions. Between my installed v0.50.0 and the current v0.120.0, OpenAI had shipped nearly two thousand releases. That's roughly 26 releases per day. The Codex team does not sleep.&lt;/p&gt;

&lt;p&gt;The v0.50.0 lineage tells a story:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;0.50.0-alpha.1&lt;/code&gt; — the optimistic beginning&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;0.50.0-alpha.2&lt;/code&gt; — "we found some issues"&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;0.50.0-alpha.3&lt;/code&gt; — "we found more issues"
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;0.50.0&lt;/code&gt; — "ship it, we'll fix it in 0.51"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And then they shipped 0.51. And 0.52. And kept going for &lt;em&gt;eighteen hundred more releases&lt;/em&gt; while I sat on 0.50.0 like it was a vintage wine that would appreciate with age.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Broke
&lt;/h2&gt;

&lt;p&gt;The root cause was MCP protocol compatibility. Between v0.50.0 and v0.120.0, the Codex CLI underwent significant architectural changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Typed code-mode tool declarations.&lt;/strong&gt; v0.120.0 introduced proper TypeScript-style type declarations for tool calls. v0.50.0 was sending untyped tool schemas. Modern MCP servers (including the ones OpenClaw spins up) expected typed declarations and silently dropped the untyped ones.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Core crate extractions.&lt;/strong&gt; The Codex team extracted core functionality into separate Rust crates. This changed the internal message format in subtle ways that only manifested when Codex talked to external MCP servers (as opposed to its built-in tools).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP cleanup fixes.&lt;/strong&gt; There were literal bug fixes for MCP session management — connection pooling, timeout handling, retry logic. My v0.50.0 was using MCP patterns that had known bugs &lt;em&gt;which were fixed a thousand versions ago.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Richer MCP app support.&lt;/strong&gt; The newer version supports MCP apps as first-class citizens. My v0.50.0 was treating MCP connections as second-class tool providers, which meant every agent handoff was going through a compatibility shim that occasionally lost messages.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The beautiful irony: my &lt;code&gt;config.toml&lt;/code&gt; was perfectly configured.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gpt-5.4"&lt;/span&gt;
&lt;span class="py"&gt;reasoning_effort&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"medium"&lt;/span&gt;  
&lt;span class="py"&gt;personality&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"pragmatic"&lt;/span&gt;

&lt;span class="nn"&gt;[plugins]&lt;/span&gt;
&lt;span class="py"&gt;gmail&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai-curated"&lt;/span&gt;
&lt;span class="py"&gt;github&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"openai-curated"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model migrations from &lt;code&gt;gpt-5&lt;/code&gt; → &lt;code&gt;gpt-5.3-codex&lt;/code&gt; → &lt;code&gt;gpt-5.4&lt;/code&gt; were all properly specified. The config was fine. The binary executing that config was from a different geological era.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @openai/codex@latest
&lt;span class="nv"&gt;$ &lt;/span&gt;codex &lt;span class="nt"&gt;--version&lt;/span&gt;
0.120.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two seconds. Two seconds to fix what took me two days to diagnose.&lt;/p&gt;

&lt;p&gt;The agent stack came back online immediately. MCP handshakes completed. Tool calls went through. Sessions that had been failing at 40% completion started running to 100%. The 2,026 sessions in &lt;code&gt;~/.codex/sessions/&lt;/code&gt; started growing again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Timeline of Discovery
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Activity&lt;/th&gt;
&lt;th&gt;Usefulness&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hour 0-3&lt;/td&gt;
&lt;td&gt;Blame Groq&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hour 3-6&lt;/td&gt;
&lt;td&gt;Rewrite MCP config&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hour 6-9&lt;/td&gt;
&lt;td&gt;File GitHub issues against OpenClaw&lt;/td&gt;
&lt;td&gt;0% (but they were well-written)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hour 9-11&lt;/td&gt;
&lt;td&gt;Forensic analysis of SQLite cache&lt;/td&gt;
&lt;td&gt;5% (found the version discrepancy clue)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hour 11&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex --version&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hour 11 + 2 sec&lt;/td&gt;
&lt;td&gt;&lt;code&gt;npm install -g @openai/codex@latest&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;∞%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Total debugging time: ~24 hours.&lt;br&gt;
Total fix time: 2 seconds.&lt;br&gt;
Ratio: 43,200:1.&lt;/p&gt;
&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Check the version first. Always.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before you blame the cloud, blame the config, blame the provider, blame Mercury retrograde — run &lt;code&gt;--version&lt;/code&gt;. I know this. I've told junior devs this. I've written it on whiteboards. And I still spent 24 hours not doing it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. npm global installs are haunted.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The failure mode here was a partial update: npm's package metadata updated, but the binary didn't get replaced. This is a known class of npm bugs that's existed for a decade. If you run a global npm tool in production (or production-adjacent) workflows, pin it with a version manager or at least verify the binary version matches &lt;code&gt;npm list -g&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. MCP compatibility is version-sensitive.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;MCP is still young. The protocol is evolving fast. Unlike HTTP, where a server from 2015 can talk to a client from 2025, MCP servers and clients need to be within a reasonable version range of each other. When your MCP client is 1,886 versions behind, "reasonable" left the building months ago.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Multi-agent stacks amplify version debt.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a monolith, a stale dependency usually manifests as a clear error. In a multi-agent stack where five services talk through protocol bridges, a stale dependency manifests as &lt;em&gt;mysterious partial failures with no error messages.&lt;/em&gt; The debugging surface area is multiplicative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. The cache lies.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My SQLite cache said &lt;code&gt;client_version: 0.120.0&lt;/code&gt; because it had been written by a &lt;em&gt;different invocation&lt;/em&gt; of Codex (probably through OpenClaw's process spawning, which had its own newer copy). The lesson: cache metadata reflects the last writer, not the current runtime. Always verify at the binary level.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Broader Point
&lt;/h2&gt;

&lt;p&gt;We're in the era of agent stacks — systems where multiple AI-powered tools coordinate through shared protocols. These stacks are powerful but they have a failure mode that traditional software doesn't: &lt;strong&gt;silent degradation&lt;/strong&gt;. When your REST API client is outdated, you get a 400 error. When your MCP client is outdated, you get a successful handshake that quietly drops half the capabilities.&lt;/p&gt;

&lt;p&gt;The tooling will catch up. Version compatibility matrices, protocol negotiation, graceful degradation warnings — it's all coming. But right now, in April 2026, the state of the art is a developer staring at their terminal at 2 AM, typing &lt;code&gt;--version&lt;/code&gt; for the thing they should have checked twelve hours ago.&lt;/p&gt;

&lt;p&gt;My agent stack is humming now. All 2,026 sessions are flowing. Codex and OpenClaw are best friends again. MCP connections are solid.&lt;/p&gt;

&lt;p&gt;And I've added a cron job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;0 9 &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; 1 codex &lt;span class="nt"&gt;--version&lt;/span&gt; | mail &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"codex version check"&lt;/span&gt; me@example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because I &lt;em&gt;will&lt;/em&gt; forget again.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Ryan builds AI agent infrastructure at &lt;a href="https://dreamsitebuilders.com" rel="noopener noreferrer"&gt;DreamSiteBuilders.com&lt;/a&gt;. He can be found on &lt;a href="https://github.com/dodge1218" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; shipping tools that solve problems he created for himself.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>devops</category>
      <category>ai</category>
      <category>debugging</category>
    </item>
    <item>
      <title>The GPU Burst Pattern: $87 in Compute, $12,000 in Revenue</title>
      <dc:creator>signalscout</dc:creator>
      <pubDate>Tue, 07 Apr 2026 21:30:57 +0000</pubDate>
      <link>https://forem.com/vonb/the-gpu-burst-pattern-87-in-compute-12000-in-revenue-5020</link>
      <guid>https://forem.com/vonb/the-gpu-burst-pattern-87-in-compute-12000-in-revenue-5020</guid>
      <description>&lt;h1&gt;
  
  
  The GPU Burst Pattern: $87 in Compute, $12,000 in Revenue
&lt;/h1&gt;

&lt;h2&gt;
  
  
  AI Is So Cheap Now That "Spray and Pray" Actually Works — If You Do the Math First
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;By Ryan Brubeck | April 2026&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Three days ago, I had an idea. A big one.&lt;/p&gt;

&lt;p&gt;What if I generated &lt;strong&gt;4,828 custom websites&lt;/strong&gt; — one for every local business in my target area that doesn't have one — deployed all of them, and emailed each business owner: &lt;em&gt;"We built your website. Here it is. $499 if you want it."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;My first reaction: &lt;em&gt;"That would cost thousands of dollars in AI processing."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I almost didn't do the math. And that almost-mistake is exactly why I'm writing this article.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The actual compute cost: $87.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Even at a terrible conversion rate — just 0.5% of businesses saying yes — that's 24 customers × $499 = &lt;strong&gt;$11,976 in revenue&lt;/strong&gt; from one afternoon of GPU time.&lt;/p&gt;

&lt;p&gt;Here's how this works.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Old Way vs. The New Way
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Old way to get clients (what I was doing):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Find a business without a website → 10 minutes&lt;/li&gt;
&lt;li&gt;Build a custom demo website → 2-4 hours&lt;/li&gt;
&lt;li&gt;Send them an email → 5 minutes&lt;/li&gt;
&lt;li&gt;Repeat&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's 3-5 hours per prospect. At that rate, reaching 4,828 businesses would take... approximately 3 years of full-time work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New way (what AI makes possible):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pull a list of 4,828 businesses without websites → 20 minutes (data from a business database)&lt;/li&gt;
&lt;li&gt;AI generates a custom website for each one → 4 hours of GPU time&lt;/li&gt;
&lt;li&gt;Deploy all of them automatically → 1 hour&lt;/li&gt;
&lt;li&gt;AI writes personalized emails with the live website link → 30 minutes of GPU time&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total time: &lt;strong&gt;One afternoon.&lt;/strong&gt; Total compute cost: &lt;strong&gt;$87.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's "Batch Processing"?
&lt;/h2&gt;

&lt;p&gt;Here's the concept in plain English:&lt;/p&gt;

&lt;p&gt;Instead of asking the AI to do one thing at a time (build one website, then the next, then the next), you line up thousands of tasks and let the AI chew through them all in one session. This is called &lt;strong&gt;batch processing&lt;/strong&gt; — processing a whole batch at once instead of one at a time.&lt;/p&gt;

&lt;p&gt;It's like the difference between hand-washing 4,828 dishes one at a time versus running an industrial dishwasher.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;the GPU doesn't care whether it processes one website or five thousand.&lt;/strong&gt; You're paying for the time it's running, not the number of tasks. So the more you cram into a session, the cheaper each individual task gets.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Economics (This Is the Important Part)
&lt;/h2&gt;

&lt;p&gt;Let's break this down in a way that makes the opportunity obvious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost side:&lt;/strong&gt;&lt;br&gt;
| Item | Cost |&lt;br&gt;
|------|------|&lt;br&gt;
| GPU rental (H200 × 2 for 10 hours) | $41.40 |&lt;br&gt;
| Extra compute for email generation | $15.60 |&lt;br&gt;
| Data enrichment (business details) | $30.00 |&lt;br&gt;
| &lt;strong&gt;Total&lt;/strong&gt; | &lt;strong&gt;$87.00&lt;/strong&gt; |&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Revenue side (conservative estimates):&lt;/strong&gt;&lt;br&gt;
| Conversion Rate | Customers | Revenue at $499 each |&lt;br&gt;
|----------------|-----------|---------------------|&lt;br&gt;
| 0.5% (terrible) | 24 | $11,976 |&lt;br&gt;
| 1% (low) | 48 | $23,952 |&lt;br&gt;
| 2% (average for targeted outreach) | 97 | $48,403 |&lt;/p&gt;

&lt;p&gt;Even the &lt;em&gt;worst-case scenario&lt;/em&gt; returns 138× the compute investment. That's not a typo. One hundred and thirty-eight times.&lt;/p&gt;

&lt;h2&gt;
  
  
  "What's a Conversion Rate?"
&lt;/h2&gt;

&lt;p&gt;Quick explanation: &lt;strong&gt;conversion rate&lt;/strong&gt; is just the percentage of people who say yes. If you email 100 people and 2 buy something, that's a 2% conversion rate.&lt;/p&gt;

&lt;p&gt;For cold outreach (emailing people who didn't ask to hear from you), 1-3% is typical for a genuinely useful offer. And "here's a free website we already built for your business" is a genuinely useful offer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Burst" in GPU Burst
&lt;/h2&gt;

&lt;p&gt;Yesterday's article explained how you can rent GPU supercomputers by the hour. The &lt;strong&gt;burst pattern&lt;/strong&gt; takes that one step further:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Spend time preparing your batch&lt;/strong&gt; — gather the data, define what each output should look like, write the AI instructions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rent the GPUs&lt;/strong&gt; — spin up the hardware on Vast.ai&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blast through the entire batch&lt;/strong&gt; — let the AI process everything in one focused session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shut down&lt;/strong&gt; — turn off the GPUs, stop paying&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The "burst" is the focused blast of processing. You don't keep GPUs running 24/7 — you spin them up when you have a big batch, process it all, and shut down. &lt;/p&gt;

&lt;p&gt;It's like renting a moving truck. You don't need it every day, but when you need it, you really need it. And it's way cheaper than owning one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other Things You Can Burst
&lt;/h2&gt;

&lt;p&gt;The website example is real, but the pattern works for any high-volume task:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Content creation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate 500 social media posts for the next 6 months → ~$5 in compute&lt;/li&gt;
&lt;li&gt;Write personalized outreach emails for 10,000 prospects → ~$20&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Data analysis:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analyze 5,000 customer reviews and summarize themes → ~$8&lt;/li&gt;
&lt;li&gt;Score and rank 2,000 job applicants based on criteria → ~$12&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Research:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Summarize 1,000 academic papers on a topic → ~$15&lt;/li&gt;
&lt;li&gt;Analyze every competitor's pricing page in your industry → ~$10&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Product development:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate and evaluate 200 business name ideas → ~$2&lt;/li&gt;
&lt;li&gt;Create detailed product descriptions for a 500-item catalog → ~$10&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern is always the same: prepare the batch, rent the compute, blast through it, shut down.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mental Model Shift
&lt;/h2&gt;

&lt;p&gt;Most people think about AI as a conversational tool — you ask a question, it answers. One at a time.&lt;/p&gt;

&lt;p&gt;The burst pattern treats AI as an &lt;strong&gt;industrial tool&lt;/strong&gt; — you prepare a production run, process thousands of outputs, and harvest the results.&lt;/p&gt;

&lt;p&gt;This is the difference between using a printer to print one letter and using it to print 10,000 marketing flyers. Same machine, completely different value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Now?
&lt;/h2&gt;

&lt;p&gt;Three things happened in 2025-2026 that made this possible:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Open-weight models&lt;/strong&gt; — Companies like Meta, OpenAI, and DeepSeek released their AI models for anyone to use. You don't need permission or an expensive API key to run them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GPU rental markets&lt;/strong&gt; — Platforms like Vast.ai created an Airbnb for supercomputers. Prices dropped from $10+/hour per GPU to under $3/hour.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Software like vLLM&lt;/strong&gt; — Tools that make it easy to run these models efficiently on rented hardware. What used to require a team of engineers now takes a 10-minute setup.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A year ago, this pattern would have cost $500+ per batch. Today it costs $87. A year from now, it'll probably cost $20.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started (The Simple Version)
&lt;/h2&gt;

&lt;p&gt;If you've been following this series all week, you already have the pieces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Your $12/month cloud computer&lt;/strong&gt; (from Tuesday's article) handles your daily AI tasks via free APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The loop&lt;/strong&gt; (from Wednesday) is how you communicate with the AI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The tier system&lt;/strong&gt; (from Thursday) tells you when to use free vs. paid models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU bursts&lt;/strong&gt; (yesterday + today) are for the heavy lifting that free APIs can't handle&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The burst pattern is the final piece. It's what turns a cool hobby project into a money-making machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;AI is now cheap enough that you can generate thousands of customized outputs and the cost per unit is essentially zero. The constraint isn't compute anymore — it's having a good idea for what to process in bulk.&lt;/p&gt;

&lt;p&gt;So here's my challenge to you: &lt;strong&gt;What could you do if you could run an AI task 5,000 times for under $100?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Think about it. Then go do it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Ryan Brubeck builds AI automation tools at DreamSiteBuilders.com. He generated his first $12K from a single GPU burst and hasn't stopped finding new batches to run.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This was the final article in the "Beginner's Guide to Personal AI" series. Follow for more on building businesses with AI — no coding required.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; #AI #Entrepreneurship #GPUBurst #BatchProcessing #Revenue #Beginners #BuildInPublic #VastAI&lt;/p&gt;

</description>
      <category>ai</category>
      <category>entrepreneurship</category>
      <category>beginners</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How I Processed 335,000 Tokens in One Night for 57 Cents</title>
      <dc:creator>signalscout</dc:creator>
      <pubDate>Tue, 07 Apr 2026 21:22:45 +0000</pubDate>
      <link>https://forem.com/vonb/how-i-processed-335000-tokens-in-one-night-for-57-cents-5bof</link>
      <guid>https://forem.com/vonb/how-i-processed-335000-tokens-in-one-night-for-57-cents-5bof</guid>
      <description>&lt;h1&gt;
  
  
  How I Processed 335,000 Tokens in One Night for 57 Cents
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Renting a Supercomputer by the Hour Changed Everything About How I Think About AI Costs
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;By Ryan Brubeck | April 2026&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Last week, I hit a wall. The free AI services I use have daily limits (you can only ask so many questions per day before they tell you to come back tomorrow). My AI assistant system — which builds websites, generates leads, and writes emails — was burning through those limits by noon.&lt;/p&gt;

&lt;p&gt;I needed more. A lot more. So I did something that sounds insane but cost less than a cup of coffee: &lt;strong&gt;I rented two supercomputer graphics cards for a few hours and ran my own AI.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's exactly what happened.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wait — You Can Rent a Supercomputer?
&lt;/h2&gt;

&lt;p&gt;Yes. And it's shockingly easy.&lt;/p&gt;

&lt;p&gt;First, some quick vocab:&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;GPU&lt;/strong&gt; (Graphics Processing Unit) is a special computer chip originally designed to render video game graphics. Turns out, the same hardware that makes your games look pretty is &lt;em&gt;incredible&lt;/em&gt; at running AI models. That's why NVIDIA — the company that makes the most popular GPUs — became one of the most valuable companies on Earth.&lt;/p&gt;

&lt;p&gt;The specific GPUs I rented are called &lt;strong&gt;H200s&lt;/strong&gt; — they're NVIDIA's top-of-the-line AI chips. One of these costs about $30,000 to buy. I rented two of them for $4.14 per hour through a platform called &lt;a href="https://vast.ai" rel="noopener noreferrer"&gt;Vast.ai&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Vast.ai is like Airbnb, but for GPUs. People and data centers with spare computing power list their machines, and you rent them by the hour. No commitment, no contracts. You spin one up when you need it and shut it down when you're done.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Does "Running Your Own AI" Mean?
&lt;/h2&gt;

&lt;p&gt;Normally when you use ChatGPT or Claude, here's what happens behind the scenes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You type a message&lt;/li&gt;
&lt;li&gt;Your message gets sent over the internet to OpenAI's (or Anthropic's) servers&lt;/li&gt;
&lt;li&gt;Their computers run the AI model on your message&lt;/li&gt;
&lt;li&gt;They send the response back&lt;/li&gt;
&lt;li&gt;They charge you for the processing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;"Running your own AI" means skipping the middleman. Instead of sending your messages to someone else's computer, you:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Rent a powerful computer (the GPUs on Vast.ai)&lt;/li&gt;
&lt;li&gt;Download an &lt;strong&gt;open-weight model&lt;/strong&gt; — that's an AI model where the creators released it for anyone to use for free (like OpenAI's GPT-OSS 120B or Meta's Llama)&lt;/li&gt;
&lt;li&gt;Run it on your rented computer&lt;/li&gt;
&lt;li&gt;Send your messages directly to it&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No per-message fees. No rate limits. No daily caps. You pay only for the time the computer is turned on.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup: 10 Minutes, Start to Finish
&lt;/h2&gt;

&lt;p&gt;I'm going to walk you through what I did. You don't need to understand every detail — the point is how &lt;em&gt;simple&lt;/em&gt; this is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1:&lt;/strong&gt; I went to Vast.ai and searched for the cheapest available H200 GPUs. Found a pair for $4.14/hour.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2:&lt;/strong&gt; I clicked "rent" and told it to start a program called &lt;strong&gt;vLLM&lt;/strong&gt; — that's a piece of software specifically designed to run AI models efficiently on GPUs. Think of it as the engine that makes the AI go.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3:&lt;/strong&gt; I set up a secure connection between my computer and the rented GPUs (called an "SSH tunnel" — basically a private, encrypted pipe between the two computers).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4:&lt;/strong&gt; I pointed my AI assistant (OpenClaw) at the rented GPUs instead of the usual free APIs.&lt;/p&gt;

&lt;p&gt;Done. My entire AI system was now running on my own private supercomputer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;Over the next 8 hours, my system processed &lt;strong&gt;335,000 tokens&lt;/strong&gt; — that's roughly 335,000 words' worth of AI processing. It built websites, generated emails, analyzed data, and wrote content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Total cost of the GPU rental:&lt;/strong&gt; $33.12 (8 hours × $4.14/hour)&lt;/p&gt;

&lt;p&gt;But here's the wild part — I didn't even use the full capacity. The GPUs were mostly idle between tasks. If I look at actual compute time used:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Effective cost for 335,000 tokens: approximately $0.57.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Fifty-seven cents. For a workload that would have cost $15-50 through commercial APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters (The Bigger Picture)
&lt;/h2&gt;

&lt;p&gt;This isn't about saving $15. It's about a mental shift.&lt;/p&gt;

&lt;p&gt;Most people think about AI costs like this: "Each question costs me X cents." That creates a scarcity mindset — you ration your AI usage, you avoid asking follow-up questions, you don't experiment.&lt;/p&gt;

&lt;p&gt;The GPU rental model flips this: "I'm paying $4/hour regardless. I might as well use it as much as possible." Suddenly you're running experiments you never would have tried. Processing datasets you would have skipped. Generating variations you would have settled without.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cost per task approaches zero when you batch enough work into a rental session.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers for Different Budgets
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Cost for 335K Tokens&lt;/th&gt;
&lt;th&gt;Daily Limit?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ChatGPT Pro ($200/mo)&lt;/td&gt;
&lt;td&gt;"Included" but rate-limited&lt;/td&gt;
&lt;td&gt;Yes, and you'll hit it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude API (Tier 1 pricing)&lt;/td&gt;
&lt;td&gt;~$25&lt;/td&gt;
&lt;td&gt;No hard limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek API&lt;/td&gt;
&lt;td&gt;~$0.10&lt;/td&gt;
&lt;td&gt;No hard limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-hosted on Vast.ai&lt;/td&gt;
&lt;td&gt;~$0.57&lt;/td&gt;
&lt;td&gt;None whatsoever&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free tier (Groq/Cerebras)&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;td&gt;Yes, resets daily&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Who Should Actually Do This?
&lt;/h2&gt;

&lt;p&gt;Let me be honest: if you're casually using ChatGPT a few times a day, this is overkill. Just use the free tier of Groq or the free ChatGPT plan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This makes sense if you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run an AI assistant system that processes thousands of messages a day&lt;/li&gt;
&lt;li&gt;Need to process large batches of data (thousands of emails, hundreds of documents)&lt;/li&gt;
&lt;li&gt;Want to run AI without any rate limits or daily caps&lt;/li&gt;
&lt;li&gt;Are building a product powered by AI and need to control costs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The "Burst" Pattern
&lt;/h2&gt;

&lt;p&gt;Here's how I actually use this in practice — I call it the &lt;strong&gt;burst pattern&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Most of the time:&lt;/strong&gt; Use free APIs (Groq, Cerebras, OpenRouter). Cost: $0.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When I hit a wall:&lt;/strong&gt; Rent GPUs on Vast.ai for a few hours, blast through the workload. Cost: $10-30.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shut down:&lt;/strong&gt; Turn off the rental. Back to free.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Average monthly cost with this pattern: &lt;strong&gt;$12 (cloud computer) + $20-40 (occasional GPU bursts) = $32-52/month&lt;/strong&gt; for unlimited AI processing power that would cost $500+ through commercial APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Isn't This Complicated?"
&lt;/h2&gt;

&lt;p&gt;The initial setup takes about 30 minutes if you've never done it before, and 10 minutes once you've done it once. Vast.ai has a pretty straightforward interface — you search for GPUs, click rent, and it gives you connection details.&lt;/p&gt;

&lt;p&gt;The actual hard part is knowing when to burst and when to use free APIs. And that's really just a judgment call: if the free APIs are fast enough, use them. If you need to process a big batch or you're hitting rate limits, spin up a GPU rental.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AI compute is commoditized.&lt;/strong&gt; The actual processing power is cheap. What you're paying for with $200/month subscriptions is convenience and a pretty interface.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Batch your heavy work.&lt;/strong&gt; Don't rent GPUs to process one thing. Save up tasks and blast through them in a focused session.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The free tier handles 90% of daily work.&lt;/strong&gt; GPU bursts are for the other 10% — the heavy lifting.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Open-weight models are the key.&lt;/strong&gt; Companies like Meta (Llama), OpenAI (GPT-OSS), and DeepSeek release their models for anyone to use. Without these, self-hosting wouldn't be possible.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Ryan Brubeck builds AI agent infrastructure at DreamSiteBuilders.com. His systems have processed millions of tokens at an average cost of approximately nothing.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Tomorrow: "The GPU Burst Pattern — How I Generated $12,000 in Revenue from $87 in Compute"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; #AI #GPU #VastAI #SelfHosting #Beginners #CostSaving #OpenSource&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>opensource</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
