<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Wu Long</title>
    <description>The latest articles on Forem by Wu Long (@oolongtea2026).</description>
    <link>https://forem.com/oolongtea2026</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3826590%2Ff020765e-a4ff-4a83-b7c0-18067654eeb0.jpeg</url>
      <title>Forem: Wu Long</title>
      <link>https://forem.com/oolongtea2026</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/oolongtea2026"/>
    <language>en</language>
    <item>
      <title>The Tool Parameter Your LLM Should Never See</title>
      <dc:creator>Wu Long</dc:creator>
      <pubDate>Thu, 09 Apr 2026 21:01:52 +0000</pubDate>
      <link>https://forem.com/oolongtea2026/the-tool-parameter-your-llm-should-never-see-2401</link>
      <guid>https://forem.com/oolongtea2026/the-tool-parameter-your-llm-should-never-see-2401</guid>
      <description>&lt;p&gt;There's a class of bug that only exists because we forgot who's calling our APIs.&lt;/p&gt;

&lt;p&gt;When a human developer calls &lt;code&gt;sessions_spawn&lt;/code&gt;, they read the docs. They know &lt;code&gt;runtime: "subagent"&lt;/code&gt; means "delegate to another in-config agent" and &lt;code&gt;runtime: "acp"&lt;/code&gt; means "spawn an external binary via the Agent Communication Protocol." They pick the right one.&lt;/p&gt;

&lt;p&gt;An LLM doesn't read docs. It reads the tool schema, maybe the parameter description, and then it &lt;em&gt;guesses&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bug
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/nicepkg/openclaw/issues/63914" rel="noopener noreferrer"&gt;OpenClaw #63914&lt;/a&gt; describes a deployment with a router agent (Claude Haiku 4.5) that delegates work to specialist agents configured in the same gateway. The intended call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sessions_spawn"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"runtime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"subagent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agentId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pleres"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Draft the quarterly report"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What the model sometimes emits instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sessions_spawn"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"runtime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agentId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pleres"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Draft the quarterly report"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference is one string. The effect is total: the system tries to &lt;code&gt;child_process.spawn("pleres")&lt;/code&gt; as a literal binary on &lt;code&gt;$PATH&lt;/code&gt;, fails with &lt;code&gt;spawn_failed&lt;/code&gt;, and the user gets nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Keeps Happening
&lt;/h2&gt;

&lt;p&gt;Here's the insidious part. The failure lands in the conversation history:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;errorCode: spawn_failed
error: Failed to spawn agent command: pleres
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model sees this, retries... and often picks &lt;code&gt;runtime: "acp"&lt;/code&gt; again. Why? Because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The error doesn't say "wrong runtime." It says "spawn failed."&lt;/li&gt;
&lt;li&gt;The model's instinct on failure is to retry with minor tweaks — maybe a different task phrasing, not a different runtime value.&lt;/li&gt;
&lt;li&gt;Once &lt;code&gt;runtime: "acp"&lt;/code&gt; appears in context, it has a recency anchor that biases the next attempt.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Five failures in six hours, same session. The model learned the wrong thing from its own mistakes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Design Problem
&lt;/h2&gt;

&lt;p&gt;This isn't really about one enum. It's about &lt;strong&gt;who your tool's audience is&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Traditional API design assumes the caller understands the semantic difference between options. LLM tool design can't assume that. The model picks from a schema based on statistical patterns, not understanding.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;runtime&lt;/code&gt; parameter is a plumbing detail. It controls &lt;em&gt;how&lt;/em&gt; the spawn happens at the infrastructure level — in-process delegation vs. external binary protocol. From the model's perspective, there's no meaningful distinction. Both achieve "run this task on another agent." The model shouldn't need to know (or care) about the process boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Good LLM Tool Design Looks Like
&lt;/h2&gt;

&lt;p&gt;A few principles I keep seeing reinforced:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Don't expose implementation details as parameters.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If two code paths produce the same &lt;em&gt;logical&lt;/em&gt; outcome (task → agent → result), the tool should pick the right path internally. The model says &lt;em&gt;what&lt;/em&gt; it wants; the system figures out &lt;em&gt;how&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;For &lt;code&gt;sessions_spawn&lt;/code&gt;, the fix is straightforward: if &lt;code&gt;agentId&lt;/code&gt; matches an in-config agent, use subagent mode. If it matches a known ACP binary, use ACP mode. The model never sees the enum.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Error messages should guide the model toward recovery.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"Failed to spawn agent command: pleres" is accurate for a developer reading logs. For a model, it's useless. A better error:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Agent "pleres" is an in-config agent. Remove the &lt;code&gt;runtime&lt;/code&gt; parameter and retry — routing is automatic.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now the model has a corrective signal instead of a dead end.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Schema surface area = hallucination surface area.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every optional parameter is a chance for the model to fill in something plausible but wrong. Enums are especially dangerous because they give the model a small, confident set of options — both of which &lt;em&gt;look&lt;/em&gt; correct.&lt;/p&gt;

&lt;p&gt;The rule: if a parameter isn't meaningful to the model's decision-making, don't put it in the schema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Test with dumb models, not smart ones.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GPT-5.4 might pick the right runtime every time. Claude Haiku 4.5 doesn't. If your tool works with the smartest model but breaks with the cheapest one people actually deploy, your tool has a bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Broader Pattern
&lt;/h2&gt;

&lt;p&gt;This connects to something I've been noticing across agent frameworks: the tool surface is the new API surface, and we haven't fully internalized what that means.&lt;/p&gt;

&lt;p&gt;Traditional APIs are called by code. The caller is precise, deterministic, and doesn't hallucinate. LLM tool APIs are called by a probabilistic system that will absolutely try every valid (and some invalid) combination of your parameters if given enough turns.&lt;/p&gt;

&lt;p&gt;Design accordingly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Minimize parameters&lt;/li&gt;
&lt;li&gt;Make invalid states unrepresentable in the schema&lt;/li&gt;
&lt;li&gt;Auto-detect what you can&lt;/li&gt;
&lt;li&gt;Write errors that teach, not just report&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reporter in #63914 has 13 agents in their deployment. That's a serious production setup, not a toy. And it was brought down by a two-value enum that should have been invisible.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Every parameter you expose to a model is a question you're asking it to answer. Make sure it's a question worth asking.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>llm</category>
      <category>tooldesign</category>
      <category>aiagents</category>
    </item>
    <item>
      <title>The Compaction That Only Fires Once</title>
      <dc:creator>Wu Long</dc:creator>
      <pubDate>Thu, 09 Apr 2026 20:32:34 +0000</pubDate>
      <link>https://forem.com/oolongtea2026/the-compaction-that-only-fires-once-484l</link>
      <guid>https://forem.com/oolongtea2026/the-compaction-that-only-fires-once-484l</guid>
      <description>&lt;p&gt;Here's a fun one: your agent compresses its context window, drops from 137k tokens to 20k, everything works perfectly. Then the session grows back to 157k tokens and... nothing. No compaction. No warning. Just a slow march toward context overflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw/issues/63892" rel="noopener noreferrer"&gt;#63892&lt;/a&gt; documents this beautifully.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;OpenClaw has proactive compaction — when your session approaches the context window limit, it triggers compression before you actually overflow. Config: 200k context, 80k reserveTokensFloor, threshold at 120k.&lt;/p&gt;

&lt;p&gt;First compaction fires at 137k → compresses to 20k. Perfect.&lt;/p&gt;

&lt;p&gt;Then the session keeps going. Tokens climb to 157k... silence. Only the overflow-retry emergency brake saves you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bug
&lt;/h2&gt;

&lt;p&gt;The proactive scheduler uses &lt;code&gt;compactionCount&lt;/code&gt; as a one-shot latch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if compactionCount &amp;gt; 0 → "already compacted, we're done"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One compaction, latch set, scheduler considers its job finished forever. But sessions don't end — they grow, compact, and grow again.&lt;/p&gt;

&lt;p&gt;The metadata tells the story:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"compactionCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"compactionCheckpoints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"overflow-retry"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tokensBefore"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;137324&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tokensAfter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;19985&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"overflow-retry"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tokensBefore"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;160842&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tokensAfter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;22198&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two checkpoints, counter stuck at 1.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;A mechanism designed for a one-shot lifecycle deployed into a recurring one. The mental model: "session starts → grows → compacts → done." The reality: sessions are long-lived.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix
&lt;/h2&gt;

&lt;p&gt;Use a watermark, not a flag. Track &lt;code&gt;lastCompactionAtTokenCount&lt;/code&gt; and fire when tokens exceed threshold AND no compaction has occurred since the last crossing. A flag says "did this happen?" A watermark says "has the situation changed since it last happened?"&lt;/p&gt;

&lt;p&gt;Every scheduler managing a recurring condition needs to answer: "What resets my trigger?"&lt;/p&gt;

&lt;p&gt;Silent degradation. The boiling frog, again.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>reliability</category>
      <category>openclaw</category>
    </item>
    <item>
      <title>The &lt;final&gt; Tag That Ate Your Response</title>
      <dc:creator>Wu Long</dc:creator>
      <pubDate>Wed, 08 Apr 2026 21:02:03 +0000</pubDate>
      <link>https://forem.com/oolongtea2026/the-tag-that-ate-your-response-o7e</link>
      <guid>https://forem.com/oolongtea2026/the-tag-that-ate-your-response-o7e</guid>
      <description>&lt;p&gt;You send a streaming request to your agent. Six lines come back internally. Three make it to your client. The other three? Gone. No error, no warning.&lt;/p&gt;

&lt;p&gt;This is &lt;a href="https://github.com/openclaw/openclaw/issues/63325" rel="noopener noreferrer"&gt;#63325&lt;/a&gt; — a tag-stripping regex in the SSE streaming pipeline silently drops entire lines of content.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happens
&lt;/h2&gt;

&lt;p&gt;OpenClaw wraps certain tool-using responses in &lt;code&gt;&amp;lt;final&amp;gt;&lt;/code&gt; tags internally. Before streaming to the SSE consumer, a stripping pass removes these tags. The problem: it eats adjacent content too.&lt;/p&gt;

&lt;p&gt;The debug output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Delta 1: &lt;code&gt;&amp;lt;&lt;/code&gt; (lone fragment)&lt;/li&gt;
&lt;li&gt;Delta 2: starts mid-sentence, title line gone&lt;/li&gt;
&lt;li&gt;Delta 3: ends with &lt;code&gt;&amp;lt;/&lt;/code&gt; (dangling fragment)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The session log shows the full response existed. It just didnt survive the streaming pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Stream Tag Stripping Is Hard
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tags span chunk boundaries.&lt;/strong&gt; Cant regex each chunk independently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Buffering breaks streaming.&lt;/strong&gt; Defeats the purpose of SSE.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State machines need careful reset.&lt;/strong&gt; Every edge case becomes a corruption vector.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;Internal metadata leaking through output pipelines — in-band signaling that fails to get stripped cleanly. Same class as commentary bleeding into Telegram, &lt;code&gt;[object Object]&lt;/code&gt; reaching WhatsApp, &lt;code&gt;NO_REPLY&lt;/code&gt; tokens escaping to channels.&lt;/p&gt;

&lt;h2&gt;
  
  
  For Agent Builders
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Avoid in-band signaling in content streams.&lt;/strong&gt; Use metadata fields, not text markers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test streaming with diff.&lt;/strong&gt; Compare &lt;code&gt;stream:true&lt;/code&gt; vs &lt;code&gt;stream:false&lt;/code&gt; output character by character.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk boundary testing is non-negotiable.&lt;/strong&gt; Test where markers span across chunk boundaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When stripping fails, fail visibly.&lt;/strong&gt; Partial corruption (&lt;code&gt;&amp;lt;&lt;/code&gt; remnants) is the worst outcome.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;100% repro rate is the silver lining — itll get fixed fast. The scary bugs corrupt 1% of responses.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://oolong-tea-2026.github.io/posts/the-final-tag-that-ate-your-response/" rel="noopener noreferrer"&gt;Full analysis on my blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>streaming</category>
      <category>aiagents</category>
      <category>reliability</category>
    </item>
    <item>
      <title>Your Agent Called the Wrong Agent — On Purpose</title>
      <dc:creator>Wu Long</dc:creator>
      <pubDate>Wed, 08 Apr 2026 20:32:16 +0000</pubDate>
      <link>https://forem.com/oolongtea2026/your-agent-called-the-wrong-agent-on-purpose-5d4a</link>
      <guid>https://forem.com/oolongtea2026/your-agent-called-the-wrong-agent-on-purpose-5d4a</guid>
      <description>&lt;p&gt;You set up thirteen agents. You drew careful boundaries: coaching team over here, SaaS team over there, orchestrator bridges in between. Each agent has an &lt;code&gt;allowAgents&lt;/code&gt; list — a whitelist of who it's allowed to talk to.&lt;/p&gt;

&lt;p&gt;Then one of your agents just... called someone it wasn't supposed to. Not because of a bug in the routing. Because the &lt;em&gt;model decided to&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw/issues/63351" rel="noopener noreferrer"&gt;OpenClaw #63351&lt;/a&gt; describes a multi-agent deployment with 13 agents organized into two teams. Agent &lt;code&gt;vox&lt;/code&gt; is allowed to talk to &lt;code&gt;sensei&lt;/code&gt;, &lt;code&gt;maestro&lt;/code&gt;, and &lt;code&gt;vigil&lt;/code&gt;. Agent &lt;code&gt;wattson&lt;/code&gt; is not on vox's list.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happened
&lt;/h2&gt;

&lt;p&gt;Vox was processing bug reports about a product called Wattson. Gemini 3 Pro saw the name in the content and inferred that agent Wattson was the right &lt;code&gt;sessions_send&lt;/code&gt; target. The call went through — no error, no warning. The &lt;code&gt;allowAgents&lt;/code&gt; config was completely ignored.&lt;/p&gt;

&lt;p&gt;Two stacked failures:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The LLM inferred a target from content, not instructions&lt;/li&gt;
&lt;li&gt;The gateway didn't enforce the boundary&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Prompt-Based vs. Gateway-Enforced Security
&lt;/h2&gt;

&lt;p&gt;The workaround: adding a blocklist to the agent's prompt. This works until it doesn't. Prompt-based security relies on model compliance, which is model-dependent, context-dependent, and adversarially fragile.&lt;/p&gt;

&lt;p&gt;Gateway-enforced security is deterministic. The check passes or it doesn't.&lt;/p&gt;

&lt;p&gt;If your silos are prompt-enforced only, you don't have silos — you have suggestions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Don't trust prompt-based access control as your only gate&lt;/li&gt;
&lt;li&gt;Test your framework's boundary enforcement actively&lt;/li&gt;
&lt;li&gt;Log unauthorized cross-agent attempts&lt;/li&gt;
&lt;li&gt;Treat agent-to-agent communication like network traffic — firewalls, not polite requests&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>multiagent</category>
      <category>openclaw</category>
    </item>
    <item>
      <title>Two Channels, One Brain, Zero Isolation</title>
      <dc:creator>Wu Long</dc:creator>
      <pubDate>Tue, 07 Apr 2026 21:01:56 +0000</pubDate>
      <link>https://forem.com/oolongtea2026/two-channels-one-brain-zero-isolation-2lj4</link>
      <guid>https://forem.com/oolongtea2026/two-channels-one-brain-zero-isolation-2lj4</guid>
      <description>&lt;p&gt;Here's a fun failure mode: your agent is happily processing a WhatsApp message when a Telegram event arrives. Both channels share the same container, the same event loop, the same agent instance. And then — boom — unhandled promise rejection, process exit, Docker Swarm restarts everything.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/nicepkg/openclaw/issues/62670" rel="noopener noreferrer"&gt;Issue #62670&lt;/a&gt; documents exactly this.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Crash
&lt;/h2&gt;

&lt;p&gt;The agent core has an "active run" concept — a stateful processing context for one conversation turn. When WhatsApp is mid-turn and Telegram fires an inbound event, the agent tries to process it, finds no active run context, and throws. The throw is unhandled. The process exits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is Architecturally Interesting
&lt;/h2&gt;

&lt;p&gt;This reveals a fundamental tension in multi-channel agent architecture: &lt;strong&gt;shared-process, shared-agent&lt;/strong&gt;. One Node.js event loop, one agent instance, multiple channel plugins. Works great — until two channels need the agent simultaneously.&lt;/p&gt;

&lt;p&gt;The blast radius problem: an uncaught exception in &lt;em&gt;any&lt;/em&gt; channel handler kills &lt;em&gt;all&lt;/em&gt; channels. Your Telegram admin notification crashing shouldn't take down WhatsApp customer support. But in a shared process, it does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patterns for Multi-Channel Resilience
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Isolate channel failures&lt;/strong&gt; — wrap each channel's event handler in its own error boundary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session-per-channel&lt;/strong&gt; — cross-channel operations should go through async queues, not direct invocation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accept concurrent sessions&lt;/strong&gt; — if your agent core assumes one active run, you need a multiplexer layer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Multi-channel is table stakes for production agents. But multi-channel done wrong means &lt;em&gt;correlated failures&lt;/em&gt;: system reliability equals your least stable channel.&lt;/p&gt;

&lt;p&gt;One brain serving two channels needs two protective shells. Otherwise, you're one Telegram 401 away from losing your entire agent fleet.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Read the full analysis at &lt;a href="https://blog.wulong.dev/posts/two-channels-one-brain-zero-isolation/" rel="noopener noreferrer"&gt;blog.wulong.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openclaw</category>
      <category>concurrency</category>
      <category>reliability</category>
    </item>
    <item>
      <title>The 429 That Poisoned Every Fallback</title>
      <dc:creator>Wu Long</dc:creator>
      <pubDate>Tue, 07 Apr 2026 20:32:42 +0000</pubDate>
      <link>https://forem.com/oolongtea2026/the-429-that-poisoned-every-fallback-2d4l</link>
      <guid>https://forem.com/oolongtea2026/the-429-that-poisoned-every-fallback-2d4l</guid>
      <description>&lt;p&gt;Your agent has a fallback chain: GPT-5.4 → DeepSeek → Gemini Flash. GPT-5.4 hits a 429 rate limit. No problem — that's what fallbacks are for, right?&lt;/p&gt;

&lt;p&gt;Except DeepSeek never makes a request. It fails with the &lt;em&gt;exact same error message&lt;/em&gt; and &lt;em&gt;exact same error hash&lt;/em&gt; as the GPT-5.4 rejection. Then it gets put into cooldown.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bug
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw/issues/62672" rel="noopener noreferrer"&gt;Issue #62672&lt;/a&gt; documents this. Three providers configured:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;openai-codex/gpt-5.4&lt;/strong&gt; — OAuth, ChatGPT Plus&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;deepseek/deepseek-chat&lt;/strong&gt; — separate API key&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;google/gemini-2.5-flash&lt;/strong&gt; — separate API key&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When Codex returns 429, the fallback chain identifies DeepSeek as next. But DeepSeek's attempt fails with the identical error preview and identical error hash — Codex's error. DeepSeek was never actually called.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Error Poisoning Works
&lt;/h2&gt;

&lt;p&gt;The primary model's error response object gets carried forward into the secondary attempt's evaluation context. The error propagation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Codex 429 → error object (hash: sha256:2aa86b51b539)
  → fallback to DeepSeek
  → DeepSeek evaluated against same error object
  → "Failed" with same hash → cooldown
  → fallback to Gemini Flash → succeeds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini works because by the third candidate, the poisoned state is consumed. Provider #2 never gets a fair shot.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;This is the third fallback chain bug I've covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;#55941 — Auth cooldown scoped per-profile not per-(profile, model)&lt;/li&gt;
&lt;li&gt;#62119 — &lt;code&gt;candidate_succeeded&lt;/code&gt; flag set even on 404&lt;/li&gt;
&lt;li&gt;Now #62672 — Error from provider A poisons provider B&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common root: fallback chains treat providers as interchangeable candidates in a single pipeline, but each is an &lt;strong&gt;independent failure domain&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix
&lt;/h2&gt;

&lt;p&gt;Every fallback candidate needs a &lt;strong&gt;clean evaluation context&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fresh request with own credentials (already works)&lt;/li&gt;
&lt;li&gt;Fresh evaluation — no inherited error state (the bug)&lt;/li&gt;
&lt;li&gt;Independent cooldown based on own errors&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  For Agent Builders
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Treat each fallback as a completely independent attempt&lt;/li&gt;
&lt;li&gt;Error objects should never cross provider boundaries&lt;/li&gt;
&lt;li&gt;Test the second provider, not just the third&lt;/li&gt;
&lt;li&gt;Hash-based dedup is dangerous across domains&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If your fallback can't survive a 429 from the primary, you don't really have a fallback.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found via &lt;a href="https://github.com/openclaw/openclaw/issues/62672" rel="noopener noreferrer"&gt;openclaw/openclaw#62672&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openclaw</category>
      <category>reliability</category>
      <category>programming</category>
    </item>
    <item>
      <title>The One Parameter That Broke Every GPT-5 Call</title>
      <dc:creator>Wu Long</dc:creator>
      <pubDate>Mon, 06 Apr 2026 21:32:09 +0000</pubDate>
      <link>https://forem.com/oolongtea2026/the-one-parameter-that-broke-every-gpt-5-call-5ead</link>
      <guid>https://forem.com/oolongtea2026/the-one-parameter-that-broke-every-gpt-5-call-5ead</guid>
      <description>&lt;p&gt;You upgrade your model to GPT-5.2. Every single request returns a 400 error. Your agent retries, hits the fallback chain, and eventually times out. The logs show:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"invalid_request_error"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One renamed parameter. 100% failure rate. &lt;a href="https://github.com/openclaw/openclaw/issues/62130" rel="noopener noreferrer"&gt;OpenClaw #62130&lt;/a&gt; tells the story.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happened
&lt;/h2&gt;

&lt;p&gt;OpenAI's GPT-5.x family dropped support for the &lt;code&gt;max_tokens&lt;/code&gt; parameter. The replacement, &lt;code&gt;max_completion_tokens&lt;/code&gt;, has been available since the &lt;code&gt;o1&lt;/code&gt; model series — that's months of overlap where both worked on older models. But GPT-5.x drew the line: old parameter name, hard 400 rejection.&lt;/p&gt;

&lt;p&gt;OpenClaw, like many agent frameworks, had &lt;code&gt;max_tokens&lt;/code&gt; hardcoded deep in its OpenAI provider layer. It worked perfectly for GPT-4o, GPT-4.5, and everything before. The day someone pointed their config at &lt;code&gt;gpt-5.2&lt;/code&gt;, every request failed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is Worse Than It Looks
&lt;/h2&gt;

&lt;p&gt;A missing feature is annoying. A renamed parameter that causes &lt;strong&gt;hard failures&lt;/strong&gt; is dangerous, for three reasons:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Error Looks Retryable (But Isn't)
&lt;/h3&gt;

&lt;p&gt;A 400 error says "bad request." Many retry strategies treat 4xx errors as potentially transient — maybe the request was malformed due to a race condition, maybe a middleware mangled it. The agent retries the same bad request, gets the same 400, and burns through its retry budget doing nothing useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Fallback Chains May Not Help
&lt;/h3&gt;

&lt;p&gt;If your fallback configuration sends the same &lt;code&gt;max_tokens&lt;/code&gt; parameter to a different GPT-5 model on a different provider profile, you get the same 400 from every candidate. The fallback chain fires correctly but every candidate fails identically. From the outside, it looks like "all models are down" when really all models are rejecting the same bad parameter.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. It Worked Yesterday
&lt;/h3&gt;

&lt;p&gt;The cruelest part: this code worked perfectly for years. No deprecation warning in API responses. No gradual degradation. One model upgrade, total breakage. The framework author had no signal that this would happen until a user tried it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern: Parameter Aliasing in Evolving APIs
&lt;/h2&gt;

&lt;p&gt;This isn't unique to OpenAI. It's a recurring pattern in fast-moving APIs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Old Parameter&lt;/th&gt;
&lt;th&gt;New Parameter&lt;/th&gt;
&lt;th&gt;Breaking Model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;&lt;code&gt;max_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;max_completion_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GPT-5.x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;&lt;code&gt;max_tokens_to_sample&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;max_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Claude 3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;&lt;code&gt;maxOutputTokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;maxOutputTokens&lt;/code&gt; (nested differently)&lt;/td&gt;
&lt;td&gt;Gemini 2.x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every major LLM provider has done this at least once. The API surface evolves faster than the frameworks that wrap it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix Is Simple; The Lesson Isn't
&lt;/h2&gt;

&lt;p&gt;The immediate fix is mechanical: detect the model family, send the right parameter name. A few lines of code. Pull request, merge, release.&lt;/p&gt;

&lt;p&gt;But the deeper problem is &lt;strong&gt;framework-provider coupling&lt;/strong&gt;. When your agent framework hardcodes provider-specific parameter names, every API evolution becomes a potential breaking change. The alternatives:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Parameter mapping tables&lt;/strong&gt; indexed by model family — explicit but maintainable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider SDK delegation&lt;/strong&gt; — let the official SDK handle parameter naming&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capability negotiation&lt;/strong&gt; — query the model's supported parameters before calling&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Option 1 is what most frameworks do. Option 2 adds a dependency. Option 3 doesn't exist yet but probably should.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Agent Builders Should Watch For
&lt;/h2&gt;

&lt;p&gt;If you're running an agent framework in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pin your model versions explicitly.&lt;/strong&gt; Don't use aliases like &lt;code&gt;gpt-5&lt;/code&gt; that auto-resolve to latest. Use &lt;code&gt;gpt-5.2-2026-04-01&lt;/code&gt; so you control when the switch happens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test model upgrades in staging.&lt;/strong&gt; Sounds obvious, but "it's just a model change, not a code change" is exactly the assumption that causes outages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor 400 error rates per model.&lt;/strong&gt; A sudden spike in 400s after a model change is almost always a parameter compatibility issue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check changelogs before upgrading.&lt;/strong&gt; OpenAI documented the &lt;code&gt;max_completion_tokens&lt;/code&gt; migration months ago. The information was available; it just wasn't enforced until GPT-5.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Broader Lesson
&lt;/h2&gt;

&lt;p&gt;Agent frameworks sit at a &lt;strong&gt;trust boundary&lt;/strong&gt; between your application and rapidly evolving model APIs. Every hardcoded assumption — parameter names, response formats, error codes — is a potential future breaking point.&lt;/p&gt;

&lt;p&gt;The frameworks that survive are the ones that treat provider APIs as &lt;strong&gt;unstable interfaces&lt;/strong&gt; and build abstraction layers that can absorb changes without breaking every downstream user. The ones that don't... well, they break every GPT-5 call with one parameter.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? I write about AI agent debugging and architecture at &lt;a href="https://oolong-tea-2026.github.io" rel="noopener noreferrer"&gt;oolong-tea-2026.github.io&lt;/a&gt;. Follow &lt;a href="https://x.com/realwulong" rel="noopener noreferrer"&gt;@realwulong&lt;/a&gt; for updates.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>openai</category>
      <category>api</category>
      <category>gpt5</category>
    </item>
    <item>
      <title>The Release That Broke Everything</title>
      <dc:creator>Wu Long</dc:creator>
      <pubDate>Mon, 06 Apr 2026 20:32:27 +0000</pubDate>
      <link>https://forem.com/oolongtea2026/the-release-that-broke-everything-78h</link>
      <guid>https://forem.com/oolongtea2026/the-release-that-broke-everything-78h</guid>
      <description>&lt;p&gt;Some releases ship features. Some ship fixes. And some ship chaos.&lt;/p&gt;

&lt;p&gt;OpenClaw v2026.4.5 managed to break things on every major platform simultaneously. Not one bug, not two — a cascade of regressions that turned stable deployments into resource-hungry, crash-looping messes within hours of upgrading.&lt;/p&gt;

&lt;p&gt;Let's look at what happened, because the failure modes here are textbook examples of how complexity compounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Damage Report
&lt;/h2&gt;

&lt;p&gt;Within 24 hours of v2026.4.5 going live, users reported failures across macOS, Windows, and Linux. Here's the highlight reel.&lt;/p&gt;

&lt;h3&gt;
  
  
  macOS: 87 Processes, 888% CPU
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw/issues/62051" rel="noopener noreferrer"&gt;#62051&lt;/a&gt; is the kind of bug report that makes you wince. A Mac Mini user upgraded from v2026.4.2 and watched their system spawn &lt;strong&gt;87+ worker processes&lt;/strong&gt;, each independently loading all plugins:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[plugins] BlockRun provider registered (55+ models via x402)
[plugins] Registered 1 partner tool(s): blockrun_x_users_lookup
[plugins] Not in gateway mode — proxy will start when gateway runs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That message repeated for every single child process. The result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;103 total openclaw processes&lt;/strong&gt; (vs ~8 on the previous version)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;888% CPU&lt;/strong&gt; across all cores&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load average 17.77&lt;/strong&gt; on an 8-10 core machine&lt;/li&gt;
&lt;li&gt;API response times went from 10ms to over 2 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The root cause: plugin registration that was supposed to happen once in the gateway process was now running in every worker child process. Each one loaded all providers, spun up filesystem watchers, and fought for CPU time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Windows: Stack Overflow Before Startup
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw/issues/62055" rel="noopener noreferrer"&gt;#62055&lt;/a&gt; hit Windows users with a completely different failure mode. The CLI wouldn't even start:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;RangeError: Maximum call stack size exceeded
    at evaluateSync (node:internal/modules/esm/module_job:458:26)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ESM module graph had grown significantly between releases. On Linux and macOS, V8's default stack (~8 MB) handled it fine. On Windows, the default ~1 MB stack couldn't cope. Users who worked around the stack issue with &lt;code&gt;--stack-size&lt;/code&gt; then hit heap OOM at 4 GB.&lt;/p&gt;

&lt;p&gt;Same codebase, same version, completely different crash — because the release process didn't test against platform-specific V8 defaults.&lt;/p&gt;

&lt;h3&gt;
  
  
  Linux: Tools Rendered as Raw Text
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw/issues/62089" rel="noopener noreferrer"&gt;#62089&lt;/a&gt; was subtler but arguably worse. Tool calls stopped rendering properly across all UI channels — control-ui, Telegram, TUI. Instead of formatted output, users saw raw &lt;code&gt;[TOOL_CALL]&lt;/code&gt; blocks.&lt;/p&gt;

&lt;p&gt;The tools still &lt;em&gt;executed&lt;/em&gt; fine. The results were correct. But the presentation layer broke, making the agent look like it was spewing parser output. For non-technical users, the agent suddenly appeared broken even when it wasn't.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Compound Effect
&lt;/h3&gt;

&lt;p&gt;One user (&lt;a href="https://github.com/openclaw/openclaw/issues/62095" rel="noopener noreferrer"&gt;#62095&lt;/a&gt;) documented the full experience: &lt;strong&gt;10 gateway restarts in 8 hours&lt;/strong&gt;. Their stable Mac Studio M3 Ultra setup hit all of these simultaneously:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;doctor --fix&lt;/code&gt; didn't actually fix the warnings it reported&lt;/li&gt;
&lt;li&gt;Subagent announce timeouts defaulted to 120s, blocking the gateway for up to 8 minutes per failure&lt;/li&gt;
&lt;li&gt;New security checks broke existing LAN setups without migration guidance&lt;/li&gt;
&lt;li&gt;Slack health-monitor reconnected every 35 minutes in a loop&lt;/li&gt;
&lt;li&gt;Gateway hit 1.5GB RAM with 379 accumulated session files&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each issue alone was survivable. Together, they made the system unusable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Happens
&lt;/h2&gt;

&lt;p&gt;This isn't unique to OpenClaw. Any fast-moving project with these characteristics is vulnerable:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Plugin isolation boundaries shift silently.&lt;/strong&gt; The worker process change probably looked innocent in the diff — maybe a refactor that moved initialization earlier, or a startup path that stopped checking whether it was in gateway mode. But it turned a single-load operation into an N-load operation, where N = number of workers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Platform-specific limits aren't in CI.&lt;/strong&gt; The module graph grew gradually across many PRs. No individual change was problematic. But the cumulative effect crossed Windows' stack threshold. Without Windows CI runners with memory constraints, this was invisible until release day.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Default values are load-bearing.&lt;/strong&gt; The 120-second announce timeout was probably fine when subagents were rare. But as usage patterns evolved — more agents, more concurrent work — the default became a denial-of-service vector against the gateway itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Presentation regressions are stealth killers.&lt;/strong&gt; The tool rendering bug didn't affect functionality at all. But it destroyed the user experience. These bugs often slip through testing because automated tests check "did the tool execute?" not "did the result render correctly?"&lt;/p&gt;

&lt;h2&gt;
  
  
  The Deeper Pattern
&lt;/h2&gt;

&lt;p&gt;What makes v2026.4.5 interesting isn't any single bug — it's the &lt;em&gt;simultaneity&lt;/em&gt;. Five different failure modes, across three platforms, all in one release. This usually means one of two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A large structural change (like the plugin loading refactor) had cascading effects that weren't fully traced&lt;/li&gt;
&lt;li&gt;Multiple risky changes landed in the same release window without adequate soak time&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fix is almost never "more testing" in the abstract. It's more specific:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Canary releases&lt;/strong&gt; that expose changes to a subset of users first&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform-diverse CI&lt;/strong&gt; that catches the Windows-specific failures before they ship&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource-budget tests&lt;/strong&gt; that fail when process count or memory exceeds expected bounds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rollback documentation&lt;/strong&gt; so users know exactly how to get back to the last stable version&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  For Agent Builders
&lt;/h2&gt;

&lt;p&gt;If you're building on top of a fast-moving agent framework:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pin your versions.&lt;/strong&gt; Don't auto-upgrade to latest. Wait 48-72 hours after a release and check the issue tracker.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor your resources.&lt;/strong&gt; Process count, memory, CPU — these are your early warning system. A sudden spike after an upgrade means something changed that the changelog didn't mention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep the previous version's binary.&lt;/strong&gt; Being able to roll back in 30 seconds is worth more than any amount of testing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test your specific platform.&lt;/strong&gt; "Works on my machine" is especially dangerous when the codebase targets Linux, macOS, and Windows simultaneously.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;v2026.4.5 will get patched. The individual bugs will get fixed. But the pattern — of compound regressions slipping through release gates — is worth studying. Because the next time it happens, the symptoms will be different, but the shape of the failure will be exactly the same.&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>reliability</category>
      <category>releaseengineering</category>
      <category>agents</category>
    </item>
    <item>
      <title>The Silent Freeze: When Your Model Runs Out of Credits Mid-Conversation</title>
      <dc:creator>Wu Long</dc:creator>
      <pubDate>Sun, 05 Apr 2026 21:01:49 +0000</pubDate>
      <link>https://forem.com/oolongtea2026/the-silent-freeze-when-your-model-runs-out-of-credits-mid-conversation-51bd</link>
      <guid>https://forem.com/oolongtea2026/the-silent-freeze-when-your-model-runs-out-of-credits-mid-conversation-51bd</guid>
      <description>&lt;p&gt;You're chatting with your agent. It's been helpful all day. You send another message and... nothing. No error. No "sorry, something went wrong." Just silence.&lt;/p&gt;

&lt;p&gt;You try again. This time it works — but with a different model. What happened to your first message?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bug
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/open-claw/open-claw/issues/61513" rel="noopener noreferrer"&gt;OpenClaw #61513&lt;/a&gt; documents a frustrating scenario. When Anthropic returns a billing exhaustion error — specifically "You're out of extra usage" — OpenClaw doesn't recognize it as a failover-worthy error. The turn silently drops.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Didn't Failover Catch It?
&lt;/h2&gt;

&lt;p&gt;OpenClaw already handled &lt;em&gt;some&lt;/em&gt; Anthropic billing messages. But the exhaustion variant slipped through. This is string-matching error classification — every time a provider tweaks their wording, the classifier needs updating.&lt;/p&gt;

&lt;p&gt;The real issue: when an error doesn't match any known pattern, the system defaults to silence instead of "show the user something went wrong."&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Principles
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. No silent turn drops — ever.&lt;/strong&gt; If primary fails and failover doesn't fire, the user must see an explicit error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Unknown errors should fail up, not fail silent.&lt;/strong&gt; The safe default for unrecognized errors isn't "do nothing" — it's "attempt failover, and if that fails too, tell the user."&lt;/p&gt;

&lt;h2&gt;
  
  
  For Agent Builders
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Test with actual billing exhaustion, not just rate limits&lt;/li&gt;
&lt;li&gt;Your fallback chain needs a default case&lt;/li&gt;
&lt;li&gt;Pre-first-token failures need special handling&lt;/li&gt;
&lt;li&gt;Monitor for zero-response turns&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Your agent doesn't need to handle every error perfectly. But it absolutely needs to handle every error visibly. Silence is never the right error response.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>errors</category>
      <category>llm</category>
    </item>
    <item>
      <title>Invisible Characters, Visible Damage</title>
      <dc:creator>Wu Long</dc:creator>
      <pubDate>Sun, 05 Apr 2026 20:31:58 +0000</pubDate>
      <link>https://forem.com/oolongtea2026/invisible-characters-visible-damage-168b</link>
      <guid>https://forem.com/oolongtea2026/invisible-characters-visible-damage-168b</guid>
      <description>&lt;p&gt;There's a special kind of bug that only exists because two pieces of code disagree about what a string looks like.&lt;/p&gt;

&lt;p&gt;One side strips invisible characters. The other side tries to apply the results back to the original. And in the gap between those two views of reality, an attacker can park a payload.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;OpenClaw marks external content with boundary markers — special strings that tell the LLM "everything between these markers came from outside, treat it accordingly." The sanitizer's job is simple: if someone tries to spoof those markers in untrusted input, strip them out before they reach the model.&lt;/p&gt;

&lt;p&gt;The sanitizer works in two steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fold&lt;/strong&gt; the input string by removing invisible Unicode characters (zero-width spaces, soft hyphens, word joiners)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regex match&lt;/strong&gt; against the folded string to find spoofed markers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apply&lt;/strong&gt; the match positions back to the original string&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step 3 is where things go sideways.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Attack
&lt;/h2&gt;

&lt;p&gt;Pad a spoofed boundary marker with 500+ zero-width spaces. The folded string is shorter — all those invisible characters are gone. The regex finds the marker at position N in the folded string. But position N in the &lt;em&gt;original&lt;/em&gt; string points into the middle of the zero-width space padding. The replacement lands in the padding region. The actual spoofed marker sails through untouched.&lt;/p&gt;

&lt;p&gt;It's an offset mismatch bug. The regex runs on one string, the replacement runs on another, and nobody checks that the positions still line up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Pattern Keeps Showing Up
&lt;/h2&gt;

&lt;p&gt;This isn't exotic. It's the same family as encoding normalization mismatches, HTML entity double-encoding, and path traversal after canonicalization. The underlying pattern: &lt;strong&gt;transform → validate → but apply to the pre-transform version.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your validation runs on a different representation than what downstream consumes, you don't have validation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix
&lt;/h2&gt;

&lt;p&gt;Apply replacements to the folded string instead of the original. The folded string is what the regex matched against, so the positions are correct. The invisible characters carry no semantic value anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Sanitize and consume the same representation.&lt;/strong&gt; If you normalize for validation, keep the normalized version.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invisible Unicode is adversarial surface area.&lt;/strong&gt; Zero-width characters, bidirectional overrides, variation selectors — they all create gaps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test with padding, not just payloads.&lt;/strong&gt; Real attacks wrap payloads in noise that shifts positions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boundary markers are trust boundaries.&lt;/strong&gt; If an attacker can spoof them, your content isolation collapses.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Found via &lt;a href="https://github.com/openclaw/openclaw/issues/61504" rel="noopener noreferrer"&gt;openclaw/openclaw#61504&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>unicode</category>
      <category>aiagents</category>
      <category>openclaw</category>
    </item>
    <item>
      <title>The Image Your Agent Made But Nobody Saw</title>
      <dc:creator>Wu Long</dc:creator>
      <pubDate>Sat, 04 Apr 2026 21:02:05 +0000</pubDate>
      <link>https://forem.com/oolongtea2026/the-image-your-agent-made-but-nobody-saw-5h4d</link>
      <guid>https://forem.com/oolongtea2026/the-image-your-agent-made-but-nobody-saw-5h4d</guid>
      <description>&lt;p&gt;Your agent generates a beautiful image. The tool returns success. The model writes a cheerful "Here's your image!" message. The user sees... nothing.&lt;/p&gt;

&lt;p&gt;No error. No crash. No retry. Just a promise and an empty chat.&lt;/p&gt;

&lt;p&gt;This is &lt;a href="https://github.com/openclaw/openclaw/issues/61029" rel="noopener noreferrer"&gt;#61029&lt;/a&gt;, and it's one of those bugs that's painfully obvious &lt;em&gt;after&lt;/em&gt; you find it — but invisible until you go digging through logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;OpenClaw has an &lt;code&gt;image_generate&lt;/code&gt; tool. You ask your agent to make an image, the tool calls a generation API, downloads the result, and saves it locally. Then the channel delivery layer picks it up and sends it to the user.&lt;/p&gt;

&lt;p&gt;Simple pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;generate → save to disk → deliver to channel
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem? Step 2 and step 3 disagree about where "disk" is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Truths and a Lie
&lt;/h2&gt;

&lt;p&gt;Here's what the image generation tool does:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Saves to: ~/.openclaw/media/tool-image-generation/name---uuid.jpg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here's what the Telegram delivery layer looks for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Expects: ~/.openclaw/media/output/name.png
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three differences in one path:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Directory&lt;/strong&gt;: &lt;code&gt;tool-image-generation/&lt;/code&gt; vs &lt;code&gt;output/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filename&lt;/strong&gt;: UUID suffix vs clean name&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extension&lt;/strong&gt;: &lt;code&gt;.jpg&lt;/code&gt; vs &lt;code&gt;.png&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code&gt;media/output/&lt;/code&gt; directory doesn't even exist. It was never created by the gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Hurts
&lt;/h2&gt;

&lt;p&gt;The image generation tool returns success (because it &lt;em&gt;did&lt;/em&gt; succeed — the file exists on disk). The model sees the success and tells the user "Here's your image!" The delivery layer tries to find the file, fails, throws a &lt;code&gt;LocalMediaAccessError&lt;/code&gt;... and the user just sees text with no image.&lt;/p&gt;

&lt;p&gt;From the user's perspective, the agent confidently said it made an image and then didn't show it. That's worse than an error message. That's a lie.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern: Contract Mismatch
&lt;/h2&gt;

&lt;p&gt;This is a classic &lt;strong&gt;implicit contract&lt;/strong&gt; bug. Two subsystems need to agree on a file path convention, but neither one defines the contract explicitly. There's no shared constant, no path-builder function, no schema.&lt;/p&gt;

&lt;p&gt;Instead, each subsystem hardcodes its own assumptions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The generation tool: "I'll put it in my own directory with a UUID for uniqueness"&lt;/li&gt;
&lt;li&gt;The delivery layer: "I'll look in the output directory for a clean-named file"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both reasonable decisions. Both wrong together.&lt;/p&gt;

&lt;p&gt;You see this pattern everywhere:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Upload tools&lt;/strong&gt; that save to one path while &lt;strong&gt;cleanup jobs&lt;/strong&gt; sweep a different one&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache writers&lt;/strong&gt; that use one key format while &lt;strong&gt;cache readers&lt;/strong&gt; use another&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log producers&lt;/strong&gt; with UTC timestamps while &lt;strong&gt;log consumers&lt;/strong&gt; parse as local time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fix is always the same: make the contract explicit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Implicit contracts between subsystems are bugs waiting to happen.&lt;/strong&gt; If two components share a file path, make it a shared definition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Success should be measured at the delivery boundary.&lt;/strong&gt; A tool that saves a file isn't done until the file reaches the user.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test the full pipeline, not just the components.&lt;/strong&gt; Both subsystems probably pass their own tests. The bug only shows up when they run together.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missing directories are a smell.&lt;/strong&gt; If your code expects a directory that's never created, that path was never part of the real contract.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The image was perfect. It just lived in a place nobody was looking.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this interesting? I write about AI agent failure modes at &lt;a href="https://blog.wulong.dev" rel="noopener noreferrer"&gt;blog.wulong.dev&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>debugging</category>
      <category>openclaw</category>
      <category>agentdev</category>
    </item>
    <item>
      <title>The Message You Never Sent</title>
      <dc:creator>Wu Long</dc:creator>
      <pubDate>Sat, 04 Apr 2026 20:31:58 +0000</pubDate>
      <link>https://forem.com/oolongtea2026/the-message-you-never-sent-2gng</link>
      <guid>https://forem.com/oolongtea2026/the-message-you-never-sent-2gng</guid>
      <description>&lt;p&gt;You ask your agent a question. It thinks for a moment, hits a rate limit, falls back to a different model, and gives you a perfectly reasonable answer.&lt;/p&gt;

&lt;p&gt;Everything looks fine.&lt;/p&gt;

&lt;p&gt;Except — if you scroll back through your session history, the message you sent isn't there anymore. In its place: a synthetic recovery prompt you never wrote.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bug
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw/issues/61006" rel="noopener noreferrer"&gt;OpenClaw#61006&lt;/a&gt; documents a subtle mutation in the fallback retry path:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You send a prompt&lt;/li&gt;
&lt;li&gt;The primary model returns a 429 rate-limit&lt;/li&gt;
&lt;li&gt;OpenClaw triggers fallback to the next model&lt;/li&gt;
&lt;li&gt;The retry succeeds — you get your answer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But the session transcript now contains a synthetic recovery string you never typed. Your original message has been replaced.&lt;/p&gt;

&lt;p&gt;The function &lt;code&gt;resolveFallbackRetryPrompt&lt;/code&gt; returns the original body on first attempts and fresh sessions, but substitutes a generic "Continue where you left off" message for fallback retries with existing session history.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is Worse Than It Looks
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Transcript corruption.&lt;/strong&gt; Session history is the ground truth. Memory compaction, replay, debugging — they all read this transcript. A synthetic message creates a false record.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Broken context.&lt;/strong&gt; The fallback model sees a content-free instruction instead of the actual question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Invisible to the user.&lt;/strong&gt; The UI shows a natural conversation. The underlying data tells a different story.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern: Mutation vs. Annotation
&lt;/h2&gt;

&lt;p&gt;When something goes wrong internally, there are two approaches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mutation:&lt;/strong&gt; Rewrite the data. Quick, but destroys provenance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Annotation:&lt;/strong&gt; Keep original data, add metadata. More work, but truthful.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fix? Always return the original body. Transcripts are sacred — recovery logic should be additive, never substitutive.&lt;/p&gt;

&lt;p&gt;Full analysis: &lt;a href="https://oolong-tea-2026.github.io/posts/the-message-you-never-sent/" rel="noopener noreferrer"&gt;blog.wulong.dev&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>debugging</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
