<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Liran Baba</title>
    <description>The latest articles on Forem by Liran Baba (@liran_baba).</description>
    <link>https://forem.com/liran_baba</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3853249%2F1fb2b801-c7ae-463d-b813-a600cdd7ca4f.png</url>
      <title>Forem: Liran Baba</title>
      <link>https://forem.com/liran_baba</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/liran_baba"/>
    <language>en</language>
    <item>
      <title>ForgeCode vs Claude Code: which AI coding agent actually wins?</title>
      <dc:creator>Liran Baba</dc:creator>
      <pubDate>Thu, 09 Apr 2026 11:24:56 +0000</pubDate>
      <link>https://forem.com/liran_baba/forgecode-vs-claude-code-which-ai-coding-agent-actually-wins-36c</link>
      <guid>https://forem.com/liran_baba/forgecode-vs-claude-code-which-ai-coding-agent-actually-wins-36c</guid>
      <description>&lt;p&gt;I've been using Claude Code for months. I like it. I genuinely don't get the Twitter hate. But there's one thing that's been driving me crazy: speed. I'll ask it to rename a variable across three files and it sits there thinking for 40 seconds. A simple test fix on a small repo, and I'm watching a spinner for two minutes. It's not a deal-breaker, but it's the kind of friction that builds up over a day.&lt;/p&gt;

&lt;p&gt;We recently rolled out Claude Code across our entire engineering org. We're not ditching Cursor, just giving devs the option to pick whatever tool works for them. And the feedback I kept hearing from people, unprompted: it's slow. Not everyone, not every task. But enough devs brought it up that it clearly wasn't just me being impatient.&lt;/p&gt;

&lt;p&gt;So I started looking at alternatives. OpenAI has &lt;a href="https://github.com/openai/codex" rel="noopener noreferrer"&gt;Codex CLI&lt;/a&gt; but I haven't tried the harness yet, just the models. The &lt;a href="https://www.tbench.ai/leaderboard/terminal-bench/2.0" rel="noopener noreferrer"&gt;TermBench 2.0 leaderboard&lt;/a&gt; is what caught my eye. ForgeCode at #1 with 81.8%. Claude Code at 58%, ranked #39. I installed ForgeCode that same day.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ForgeCode with Opus 4.6 was noticeably faster than Claude Code on the same tasks. Not marginal, real.&lt;/li&gt;
&lt;li&gt;ForgeCode topped &lt;a href="https://www.tbench.ai/" rel="noopener noreferrer"&gt;TermBench 2.0&lt;/a&gt; at 81.8%, but that's its own benchmark. On the independent &lt;a href="https://www.swebench.com" rel="noopener noreferrer"&gt;SWE-bench&lt;/a&gt;, the gap shrinks to 2.4 points.&lt;/li&gt;
&lt;li&gt;GPT 5.4 through ForgeCode was unstable for me. A research task on a small repo took 15 minutes.&lt;/li&gt;
&lt;li&gt;I'm double-dipping now. Claude Code is still primary, but the latency gains on ForgeCode are too real to ignore.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What is ForgeCode (and why the benchmark confusion exists)?
&lt;/h2&gt;

&lt;p&gt;ForgeCode is not an AI model. It's a model-agnostic agent harness, open source under Apache 2.0, written in Rust, that wraps any LLM through OpenRouter or direct API keys. It launched in late January 2025 and hit &lt;a href="https://github.com/antinomyhq/forgecode" rel="noopener noreferrer"&gt;v2.8.0 on GitHub&lt;/a&gt; by April 2026 with over 6,000 stars.&lt;/p&gt;

&lt;p&gt;ForgeCode ships three built-in agents. &lt;code&gt;forge&lt;/code&gt; writes and edits code. &lt;code&gt;sage&lt;/code&gt; does read-only research and can't modify files. &lt;code&gt;muse&lt;/code&gt; generates plans and writes them to a &lt;code&gt;plans/&lt;/code&gt; directory. It's Zsh-native, using a &lt;code&gt;:&lt;/code&gt; prefix so you never leave your shell.&lt;/p&gt;

&lt;p&gt;Here's the thing that matters for evaluating the benchmark: TermBench 2.0 is ForgeCode's own benchmark, hosted at tbench.ai. The organization submitting entries is ForgeCode itself. That doesn't make the results wrong. But it's not a neutral third party.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does the benchmark actually hold up?
&lt;/h2&gt;

&lt;p&gt;On &lt;a href="https://www.swebench.com" rel="noopener noreferrer"&gt;SWE-bench Verified&lt;/a&gt;, an independent benchmark from Princeton and UChicago, ForgeCode + Claude 4 scored 72.7% compared to Claude 3.7 Sonnet's 70.3%. A 2.4-point gap, not the 24-point gap TermBench implies. That context changes the whole picture.&lt;/p&gt;

&lt;p&gt;The TermBench 2.0 numbers, self-reported by ForgeCode on tbench.ai:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ForgeCode + GPT 5.4: 81.8%&lt;/li&gt;
&lt;li&gt;ForgeCode + Claude Opus 4.6: 81.8%&lt;/li&gt;
&lt;li&gt;Claude Code + Claude Opus 4.6: 58.0% (rank #39)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The SWE-bench Verified numbers, independent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ForgeCode + Claude 4: 72.7%&lt;/li&gt;
&lt;li&gt;Claude 3.7 Sonnet (extended thinking): 70.3%&lt;/li&gt;
&lt;li&gt;Claude 4.5 Opus: 76.8%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So how did ForgeCode reach 81.8%? Their blog documents four specific harness changes. They reordered JSON schema fields, putting &lt;code&gt;required&lt;/code&gt; before &lt;code&gt;properties&lt;/code&gt; to reduce GPT 5.4 tool-call errors. They flattened nested schemas. They added explicit truncation reminders when files are partially read. And they added a mandatory verification pass where a reviewer skill checks task completion before the agent can stop.&lt;/p&gt;

&lt;p&gt;These are real engineering improvements. They're also benchmark-specific optimizations. The r/ClaudeCode community called it "benchmaxxed," which is both funny and kind of fair.&lt;/p&gt;

&lt;p&gt;I've been eyeing this leaderboard for a while. The numbers are what pushed me to actually try ForgeCode. With Opus 4.6, it was noticeably faster than Claude Code. That part wasn't hype.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.swebench.com" rel="noopener noreferrer"&gt;SWE-bench scores&lt;/a&gt; went from 1.96% in late 2023 to 76.8% by early 2026. Everything's getting better fast. The question is whether a 2-point edge on an independent benchmark justifies switching your entire workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it's actually like to use ForgeCode
&lt;/h2&gt;

&lt;p&gt;Install is a one-liner: &lt;code&gt;curl -fsSL https://forgecode.dev/cli | sh&lt;/code&gt;. Then &lt;code&gt;forge provider login&lt;/code&gt; to set up your API keys and you're in. About the same friction as Claude Code. The Zsh plugin is a nice touch, you type &lt;code&gt;:&lt;/code&gt; followed by your prompt and it runs inline without switching contexts.&lt;/p&gt;

&lt;p&gt;First thing I tried: pointed it at my portfolio repo (Astro 6, maybe 30 files) with Opus 4.6 as the model. I asked it to add a post counter to the blog index page and wire it into the nav component. Claude Code takes about 90 seconds on that kind of task on this repo. ForgeCode did it in under 30. Correct output, clean diff, no hallucinated imports. The speed difference was immediately obvious.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0smwwo4a6qc8ihow0i7q.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0smwwo4a6qc8ihow0i7q.webp" alt=" " width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I ran the same kind of test a few more times. A multi-file rename, adding an external link tooltip component, restructuring a layout. ForgeCode with Opus 4.6 was consistently faster. Not by a little. I could feel it in my workflow.&lt;/p&gt;

&lt;p&gt;Plan mode was the other thing that stood out. ForgeCode's &lt;code&gt;muse&lt;/code&gt; agent writes plans to a &lt;code&gt;plans/&lt;/code&gt; directory, and the output felt more detailed and verbose than Claude Code's plan mode. Whether that's good or bad depends on what you want. I kind of liked having the longer breakdown.&lt;/p&gt;

&lt;p&gt;Then I tried GPT 5.4 through ForgeCode, and it fell apart. I asked it to research the architecture of a small repo. Fifteen minutes. Kept going unstable, tool calls failing, the agent retrying and spinning. I killed it. So "ForgeCode is fast" needs a qualifier: ForgeCode with Opus 4.6 is fast. ForgeCode with GPT 5.4 was borderline unusable for me.&lt;/p&gt;

&lt;p&gt;But I'll give them this: the ForgeCode team explicitly says they've hired zero paid influencers. The low social media presence is intentional. Kind of respect that. In an industry where half the "honest reviews" have affiliate links in the description, that's almost suspiciously refreshing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why ForgeCode is actually faster
&lt;/h2&gt;

&lt;p&gt;Part of it is just the Rust binary (Claude Code is TypeScript, so startup and memory are heavier). But that's not the whole story.&lt;/p&gt;

&lt;p&gt;ForgeCode has a context engine that indexes function signatures and module boundaries instead of dumping raw files into the context window. The agent pulls only what it needs. Some estimates say this cuts context size by about 90%, which means faster responses and cheaper models that don't lose the plot halfway through a task. That's the real reason the same model (Opus 4.6) responds faster through ForgeCode than through Claude Code.&lt;/p&gt;

&lt;p&gt;There's also a &lt;code&gt;--sandbox&lt;/code&gt; flag that creates an isolated git worktree and branch, so you can try something risky without touching your main tree and only merge back what works.&lt;/p&gt;

&lt;p&gt;What Claude Code has built &lt;em&gt;around&lt;/em&gt; the core loop, parallel agent execution, hooks, scheduled cloud tasks, auto-memory, none of that exists in ForgeCode yet. The harness is fast. Everything around it is thin. ForgeCode is a Lambo with no cup holder. Fast as hell, but you're holding your coffee between your knees.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I missed when I wasn't using Claude Code
&lt;/h2&gt;

&lt;p&gt;I didn't appreciate this until I spent a few days away from Claude Code: the stuff around the agent matters more than the agent itself.&lt;/p&gt;

&lt;p&gt;With Claude Code, I have a CLAUDE.md in every project. My team shares the same project instructions. I have hooks that fire on file changes, so I can run secret scanning, linting, whatever I want on every edit. Auto-memory means I don't re-explain my codebase every session. And checkpoints mean every file edit gets snapshotted, so if the agent breaks something three steps back, I hit &lt;code&gt;/rewind&lt;/code&gt; and roll back without touching git.&lt;/p&gt;

&lt;p&gt;ForgeCode has AGENTS.md (similar idea to CLAUDE.md) and MCP support, so the basics are covered. But no hooks, no checkpoints, no auto-memory, no IDE extensions, no JetBrains plugin. The model-agnostic part is great. The ecosystem is still thin.&lt;/p&gt;

&lt;p&gt;For reference, here's the head-to-head:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;ForgeCode&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model choice&lt;/td&gt;
&lt;td&gt;Any (300+)&lt;/td&gt;
&lt;td&gt;Claude only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open source&lt;/td&gt;
&lt;td&gt;Yes (Apache 2.0)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project config&lt;/td&gt;
&lt;td&gt;AGENTS.md&lt;/td&gt;
&lt;td&gt;CLAUDE.md (hierarchical)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP support&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (extensive)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hooks&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (6 types)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduled tasks&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (cloud + local)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sub-agents&lt;/td&gt;
&lt;td&gt;Yes (forge/sage/muse)&lt;/td&gt;
&lt;td&gt;Yes (parallel)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plan mode&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (Shift+Tab)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VS Code&lt;/td&gt;
&lt;td&gt;No extension&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JetBrains&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto memory&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Checkpoints / rewind&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Where I landed
&lt;/h2&gt;

&lt;p&gt;I'm double-dipping. Claude Code is still my primary tool, but I keep ForgeCode open for tasks where the latency kills me. Sometimes I'll drop into Cursor for something visual. Three tools is kind of ridiculous, but the latency gains on ForgeCode are real enough that I can't just ignore them.&lt;/p&gt;

&lt;p&gt;Claude Code is where my project config lives, where my hooks fire, where my MCP connections run. That's my home base and it's not changing. But when I need something fast and self-contained, a quick refactor, a file rename across a module, something where I don't need the full ecosystem, I'll run it through ForgeCode with Opus 4.6 and it's done before Claude Code would've finished reading the context.&lt;/p&gt;

&lt;p&gt;As of April 2026, ForgeCode is faster than Claude Code when running the same model (Opus 4.6), but Claude Code has the deeper ecosystem with hooks, MCP, auto-memory, and IDE integrations. Neither wins across the board. Pick the one that matches how you work and be ready to use both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is ForgeCode's TermBench #1 score legitimate?
&lt;/h3&gt;

&lt;p&gt;TermBench is ForgeCode's own benchmark. On &lt;a href="https://www.swebench.com" rel="noopener noreferrer"&gt;SWE-bench Verified&lt;/a&gt;, an independent benchmark from Princeton, ForgeCode + Claude 4 scored 72.7% compared to Claude 3.7 Sonnet's 70.3%. Solid, but not the 24-point gap TermBench suggests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can ForgeCode use my existing Claude or ChatGPT subscription?
&lt;/h3&gt;

&lt;p&gt;No. You need API keys, not a subscription login. Separate billing from whatever you pay for Claude Pro or ChatGPT Plus.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does ForgeCode burn more tokens than Claude Code?
&lt;/h3&gt;

&lt;p&gt;Nobody's published hard numbers. ForgeCode's multi-agent setup (forge/sage/muse spawning sub-agents) almost certainly burns more tokens per session. I noticed it anecdotally but didn't measure. Track your own spend if you try it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is ForgeCode safe for proprietary code?
&lt;/h3&gt;

&lt;p&gt;The harness is open source, but default telemetry collects git user emails, scans SSH directories, and sends conversation data externally. &lt;a href="https://github.com/antinomyhq/forgecode/issues/1318" rel="noopener noreferrer"&gt;GitHub issue #1318&lt;/a&gt; raised data transparency concerns. The team addressed it in March 2025: set &lt;code&gt;FORGE_TRACKER=false&lt;/code&gt; to disable all tracking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is ForgeCode free?
&lt;/h3&gt;

&lt;p&gt;The code is free and open source (Apache 2.0). The hosted service was &lt;a href="https://reddit.com/r/cursor/comments/1maq1ex" rel="noopener noreferrer"&gt;originally unlimited&lt;/a&gt;, but switched to a tiered model in mid-2025 with daily request caps on the free tier.&lt;/p&gt;




&lt;p&gt;ForgeCode's benchmark lead exists on a test it runs itself. On independent benchmarks, it's comparable. The speed with Opus 4.6 is real. The GPT 5.4 experience was rough.&lt;/p&gt;

&lt;p&gt;I didn't expect to end up running two coding agents. But here I am. If ForgeCode ships hooks and the ecosystem catches up, that could change. For now, I'm using both, and it's working.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/antinomyhq/forgecode" rel="noopener noreferrer"&gt;ForgeCode GitHub Repository&lt;/a&gt; - GitHub, April 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tbench.ai/leaderboard/terminal-bench/2.0" rel="noopener noreferrer"&gt;TermBench 2.0 Leaderboard&lt;/a&gt; - tbench.ai, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.swebench.com" rel="noopener noreferrer"&gt;SWE-bench Verified Leaderboard&lt;/a&gt; - Princeton/UChicago, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://code.claude.com/docs/en/" rel="noopener noreferrer"&gt;Claude Code Documentation&lt;/a&gt; - Anthropic, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.anthropic.com/news/claude-3-7-sonnet" rel="noopener noreferrer"&gt;Anthropic Claude 3.7 Sonnet Announcement&lt;/a&gt; - Anthropic, February 2025&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://liranbaba.dev/blog/forgecode-vs-claude-code/" rel="noopener noreferrer"&gt;liranbaba.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devtools</category>
      <category>ai</category>
      <category>claudecode</category>
      <category>forgecode</category>
    </item>
    <item>
      <title>Cursor 3 shipped parallel agents, but is any of it new?</title>
      <dc:creator>Liran Baba</dc:creator>
      <pubDate>Sun, 05 Apr 2026 15:06:27 +0000</pubDate>
      <link>https://forem.com/liran_baba/cursor-3-shipped-parallel-agents-but-is-any-of-it-new-2dd1</link>
      <guid>https://forem.com/liran_baba/cursor-3-shipped-parallel-agents-but-is-any-of-it-new-2dd1</guid>
      <description>&lt;p&gt;Cursor 3 shipped on April 2. The demos look great: eight AI agents running in parallel, each in its own Git worktree, building different parts of your project at the same time. The &lt;a href="https://news.ycombinator.com/item?id=47618084" rel="noopener noreferrer"&gt;Hacker News thread&lt;/a&gt; lit up. Product Hunt gave it the #3 spot for the day.&lt;/p&gt;

&lt;p&gt;Then I read the comments. One user reported spending $2,000 in two days on cloud agents. Another switched from $1,800/month on Cursor to roughly $200/month on Claude Code and Codex. A third said they had "zero interest" in forced agent swarms and were moving to VS Code with Claude Code instead.&lt;/p&gt;

&lt;p&gt;The coverage so far has been mostly feature recaps reprinting the press release. Nobody's asking the obvious questions: is parallel agent execution actually new? What does it really cost? And what happens when your agents need to share context?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Here's the Thing&lt;/strong&gt;&lt;br&gt;
Cursor 2 already supported parallel execution via worktree.json configuration. What Cursor 3 actually shipped is a UI layer (Agents Window sidebar, drag-drop tabs) on top of the same Git worktree primitives. The cost model is the real concern: early testers reported $2,000 bills in two days, and Cursor's pricing page doesn't explain why. The unsolved technical problem is context sharing between local and cloud agents, which the docs hand-wave as "summarized and reduced."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Cursor 3 actually shipped
&lt;/h2&gt;

&lt;p&gt;Cursor 3 lets you run up to 8 AI agents in parallel across isolated Git worktrees (&lt;a href="https://cursor.com/blog/cursor-3" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, 2026). Agents run locally via Composer 2 or in cloud isolation VMs. You can watch them all from a new sidebar called the Agents Window.&lt;/p&gt;

&lt;p&gt;That's the pitch, anyway.&lt;/p&gt;

&lt;p&gt;Cursor 2 already supported parallel agent execution through worktree.json configuration. The &lt;code&gt;/worktree&lt;/code&gt; command isn't new functionality. It's new UI. The Agents Window gives you visibility into what your agents are doing, and that part is genuinely useful. But calling this an architectural pivot is a stretch.&lt;/p&gt;

&lt;p&gt;The other additions: &lt;code&gt;/best-of-n&lt;/code&gt; runs the same prompt across multiple models side by side (Composer 2 vs. Claude vs. GPT). Design Mode lets you annotate UI elements and describe changes in plain English. The MCP Marketplace adds plugin support for hundreds of tools.&lt;/p&gt;

&lt;p&gt;Under the hood, &lt;code&gt;/worktree&lt;/code&gt; runs &lt;code&gt;git worktree add&lt;/code&gt; to create an isolated working directory on a new branch, then spawns an agent process scoped to that directory. Each agent gets its own filesystem view, so file edits don't collide mid-run. When the agent finishes, you review the diff and merge. This is the same thing you'd do manually with &lt;code&gt;git worktree add&lt;/code&gt; and a second terminal. Cursor 3 wraps it in a sidebar.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost problem nobody is talking about
&lt;/h2&gt;

&lt;p&gt;Early adopters reported spending $2,000+ in two days running Cursor 3's cloud agents (&lt;a href="https://news.ycombinator.com/item?id=47618084" rel="noopener noreferrer"&gt;Hacker News&lt;/a&gt;, 2026). That's not a typo. Two thousand dollars. Two days.&lt;/p&gt;

&lt;p&gt;Cursor's pricing page lists four tiers: Free, Pro at $20, Pro+ at $60, and Ultra at $200 per month (&lt;a href="https://cursor.com/pricing" rel="noopener noreferrer"&gt;cursor.com/pricing&lt;/a&gt;, 2026). Those numbers look reasonable until you start running cloud agents. The pricing page doesn't mention per-minute VM charges or explain how cloud agent costs are metered. The resource costs for cloud agents? Absent from the page entirely.&lt;/p&gt;

&lt;p&gt;HN user dirtbag__dad reported spending "$2k a week with premium models" before switching to Claude Code Max at "1/10th the price." Another commenter, verelo, switched from $1,800/month on Cursor to roughly $200/month on Claude and Codex, calling it "WAY better value for money."&lt;/p&gt;

&lt;p&gt;Same story every time. Listed price and actual spend have almost nothing in common. When your pricing page says $200/month but users regularly spend ten times that, the issue isn't pricing. It's that nobody can predict what anything costs before the bill shows up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code isn't immune either
&lt;/h3&gt;

&lt;p&gt;I should be fair here. Anthropic's flat-rate plans sound predictable, but they have their own version of this.&lt;/p&gt;

&lt;p&gt;In late March 2026, Claude Code Max plan users reported exhausting their quotas in under an hour. The same quota that previously lasted eight hours (&lt;a href="https://www.theregister.com/2026/03/31/anthropic_claude_code_limits/" rel="noopener noreferrer"&gt;The Register&lt;/a&gt;, 2026). The story pulled 324 points on Hacker News. BBC covered it a day later.&lt;/p&gt;

&lt;p&gt;Anthropic acknowledged the problem on Reddit: "people are hitting usage limits in Claude Code way faster than expected." A March promotion that doubled limits ended on March 28. There were reports of prompt cache bugs inflating token usage by 10-20x. And Anthropic doesn't publicly specify exact usage caps for any plan.&lt;/p&gt;

&lt;p&gt;So people started building tools just to figure out their own limits. API proxy interceptors. One developer &lt;a href="https://www.claudecodecamp.com/p/i-tried-to-reverse-engineer-claude-code-s-usage-limits" rel="noopener noreferrer"&gt;tried to reverse-engineer the utilization headers&lt;/a&gt; that Anthropic sends on every API response, because Claude Code doesn't surface them to you.&lt;/p&gt;

&lt;p&gt;I &lt;a href="https://liranbaba.dev/blog/found-database-password-in-claude-code-session/" rel="noopener noreferrer"&gt;built Claudoscope&lt;/a&gt; partly for this reason. If the tool won't tell you what it costs, build something that will.&lt;/p&gt;

&lt;p&gt;Both tools have cost transparency problems. They're just structured differently. Cursor's is per-token opacity: you don't know what cloud agents will cost until the bill arrives. Anthropic's is undisclosed caps on plans marketed as generous. Neither side has figured this out yet, which is kind of remarkable given how much both charge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The context sharing problem
&lt;/h2&gt;

&lt;p&gt;This is the technical gap that nobody's writing about, and it's the one that actually matters for how well parallel agents work in practice.&lt;/p&gt;

&lt;p&gt;Each worktree agent runs in its own isolated branch. That's the point: isolation prevents file conflicts. But it also means Agent A doesn't know what Agent B is doing. If you're building an API endpoint in one worktree and the frontend that calls it in another, those agents are working from the same base commit. Neither sees the other's in-progress changes.&lt;/p&gt;

&lt;p&gt;Cursor's docs say local and cloud agent contexts are "summarized and reduced" before sharing. That's doing a lot of work as a sentence. How much of a 100k-line codebase survives summarization? What's the token budget for the summary? Is it a full AST-aware summary or just file path lists? The docs don't say.&lt;/p&gt;

&lt;p&gt;There's also the committed-vs-dirty question. Are cloud agents working from the latest committed state on the branch, or from your local uncommitted edits? If committed: you have to commit before spawning cloud agents, which means half-finished code landing in your Git history. If uncommitted: they need filesystem sync between local and cloud, which introduces latency and consistency issues. The docs are silent on this too.&lt;/p&gt;

&lt;p&gt;I've hit a version of this problem with Claude Code's worktree parallelism. Two agents building against the same API contract will sometimes diverge on field names or response shapes because neither agent sees the other's work until merge time. The fix is manual: define the contract first, commit it, then parallelize. That works, but it means true parallelism requires upfront planning that eats into the time savings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://liranbaba.dev/blog/claude-code-source-leak/" rel="noopener noreferrer"&gt;The Claude Code source leak&lt;/a&gt; exposed how their agent orchestration handles this internally: spawning sub-agents, tool call cascading through orchestration layers, sessions that retry failed operations in loops. Context sharing between agents is an unsolved problem across the entire category, not just Cursor.&lt;/p&gt;

&lt;h2&gt;
  
  
  What parallel agents actually solve (and when they don't)
&lt;/h2&gt;

&lt;p&gt;Parallel agents deliver real speedups for the right kind of work. Building a full-stack feature with decoupled components? Four agents in parallel (UI, API, database, tests) can cut wall-clock time from eight hours to two (&lt;a href="https://cursor.com/docs/configuration/worktrees" rel="noopener noreferrer"&gt;Cursor docs&lt;/a&gt;, 2026). That's a genuine 4x on paper.&lt;/p&gt;

&lt;p&gt;I use Claude Code's worktree-based parallelism for similar workflows. Spin up multiple agents, each in an isolated branch, merge when they're done. The UX is rougher: no Agents Window, no drag-drop tabs, no visual status at a glance. But the core capability is the same, and the cost is flat.&lt;/p&gt;

&lt;p&gt;Here's where it falls apart. When Agent B depends on Agent A's output, you can't parallelize. That's most real work. For tasks under 30 minutes, the orchestration overhead eats the speedup. Solo devs on small projects get almost nothing from running eight agents simultaneously. And the context sharing gap I described above means agents working on related components will diverge unless you've done the upfront contract work.&lt;/p&gt;




&lt;p&gt;Cursor 3 is a polished UI layer on existing capabilities, positioned as an architectural breakthrough. The parallel agents are real but not new. The cost model is real but not transparent.&lt;/p&gt;

&lt;p&gt;If you're already in Claude Code, I don't see a reason to switch. If you're evaluating for the first time, try both. Run each for a week on real work, not demos. Track what you actually spend. Then decide.&lt;/p&gt;

&lt;p&gt;Or skip both and try &lt;a href="https://forgecode.dev/" rel="noopener noreferrer"&gt;ForgeCode&lt;/a&gt;. It's open source, terminal-based, and topped TermBench 2.0 at 81.8%. You bring your own API keys and pick your model. I haven't used it yet, but I'm giving it a weekend. Their blog post about hitting #1 is titled "benchmarks don't matter," which I kind of respect.&lt;/p&gt;

&lt;p&gt;That's really all I've got. Track your costs. The rest will sort itself out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How much does Cursor 3 actually cost per month?
&lt;/h3&gt;

&lt;p&gt;Plans start at $20/month but real-world spend with cloud agents ranges from $200 to $1,800+ per month based on Hacker News community reports (&lt;a href="https://news.ycombinator.com/item?id=47618084" rel="noopener noreferrer"&gt;HN&lt;/a&gt;, 2026). Cloud agent resource costs aren't disclosed on the pricing page. Track your actual spend for a full week before committing to a plan.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can you run Cursor 3 agents locally without cloud costs?
&lt;/h3&gt;

&lt;p&gt;Yes, local agents run Composer 2 on-device with no per-use charges. Cloud agents are where the parallel execution actually matters, though, and those costs aren't disclosed anywhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Cursor 3 better than Claude Code for parallel tasks?
&lt;/h3&gt;

&lt;p&gt;Claude Code supports parallel execution via worktrees at a flat $100-$200/month rate. Cursor 3 offers better visual orchestration through the Agents Window but with unpredictable costs. Pick based on what matters more to you: UI visibility or cost predictability.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cursor.com/blog/cursor-3" rel="noopener noreferrer"&gt;Cursor 3 Announcement&lt;/a&gt; - Cursor, April 2, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cursor.com/pricing" rel="noopener noreferrer"&gt;Cursor Pricing&lt;/a&gt; - cursor.com, April 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cursor.com/docs/configuration/worktrees" rel="noopener noreferrer"&gt;Cursor Parallel Agents Docs&lt;/a&gt; - Cursor docs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://news.ycombinator.com/item?id=47618084" rel="noopener noreferrer"&gt;HN: Cursor 3 Discussion&lt;/a&gt; - Hacker News, April 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.theregister.com/2026/03/31/anthropic_claude_code_limits/" rel="noopener noreferrer"&gt;Claude Code users hitting usage limits&lt;/a&gt; - The Register, March 31, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.claudecodecamp.com/p/i-tried-to-reverse-engineer-claude-code-s-usage-limits" rel="noopener noreferrer"&gt;Reverse Engineering Claude Code Limits&lt;/a&gt; - Claude Code Camp, April 1, 2026&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://liranbaba.dev/blog/cursor-3-parallel-agents/" rel="noopener noreferrer"&gt;liranbaba.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devtools</category>
      <category>ai</category>
    </item>
    <item>
      <title>Undercover mode, decoy tools, and a 3,167-line function: inside Claude Code's leaked source</title>
      <dc:creator>Liran Baba</dc:creator>
      <pubDate>Thu, 02 Apr 2026 20:34:48 +0000</pubDate>
      <link>https://forem.com/liran_baba/undercover-mode-decoy-tools-and-a-3167-line-function-inside-claude-codes-leaked-source-2159</link>
      <guid>https://forem.com/liran_baba/undercover-mode-decoy-tools-and-a-3167-line-function-inside-claude-codes-leaked-source-2159</guid>
      <description>&lt;p&gt;On March 31, a single &lt;code&gt;.map&lt;/code&gt; file shipped inside an npm package and exposed the complete internals of Claude Code. The &lt;a href="https://news.ycombinator.com/item?id=47584540" rel="noopener noreferrer"&gt;Hacker News thread&lt;/a&gt; hit 2,060 points. Anthropic filed DMCA takedowns against 8,100+ GitHub repos. And I spent most of the afternoon reading TypeScript I wasn't supposed to see.&lt;/p&gt;

&lt;p&gt;I use Claude Code every day. I built &lt;a href="https://claudoscope.com/" rel="noopener noreferrer"&gt;Claudoscope&lt;/a&gt; because I wanted to understand what it was actually doing in my terminal. So when the source dropped, I went through it. Some of it confirmed things I'd suspected. Some of it genuinely surprised me.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A JavaScript source map in Claude Code v2.1.88 exposed ~1,700 TypeScript source files (&lt;a href="https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/" rel="noopener noreferrer"&gt;alex000kim&lt;/a&gt;, 2026)&lt;/li&gt;
&lt;li&gt;Unreleased features include KAIROS autonomous mode, anti-distillation decoy tools, and "undercover mode" that hides AI authorship&lt;/li&gt;
&lt;li&gt;Anthropic's DMCA takedown hit 8,100+ repos, many containing no leaked code&lt;/li&gt;
&lt;li&gt;A clean-room rewrite called Claw Code gained 146,000 GitHub stars in under 48 hours&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What happened
&lt;/h2&gt;

&lt;p&gt;Security researcher Chaofan Shou &lt;a href="https://x.com/shoucccc/status/2038894956459290963" rel="noopener noreferrer"&gt;disclosed on X&lt;/a&gt; that Anthropic had shipped a JavaScript source map file inside Claude Code version 2.1.88 on npm. Source maps are debugging artifacts. They contain the original, readable TypeScript source before minification. They're not supposed to ship to production. This one did.&lt;/p&gt;

&lt;p&gt;Early speculation blamed a known Bun bug (&lt;a href="https://github.com/oven-sh/bun/issues/28001" rel="noopener noreferrer"&gt;oven-sh/bun#28001&lt;/a&gt;) where &lt;code&gt;bun serve&lt;/code&gt; sometimes exposes source maps in production. But that bug affects web apps hosted by Bun, not packages bundled with Bun and run locally. Claude Code uses Bun as a bundler and local runtime, not as a web server. Jared Sumner, Bun's creator and now an Anthropic employee, confirmed Claude Code doesn't use &lt;code&gt;bun serve&lt;/code&gt;, ruling this out. His comment was, as far as anyone can tell, the only public response from an Anthropic employee about the leak. The actual cause of the source map shipping in the npm package remains unexplained.&lt;/p&gt;

&lt;p&gt;About 1,700 source files were exposed, spread across utils (564 files), components (389), commands (189), tools (184), services (130), hooks (104), ink (96), and bridge (31) directories. The &lt;code&gt;.map&lt;/code&gt; file sat on the npm CDN for anyone to download. When Anthropic responded, they deprecated the package version rather than unpublishing it, so the file stayed somewhat accessible even after the response.&lt;/p&gt;

&lt;p&gt;The HN thread generated 1,013 comments. Two follow-up analysis posts scored 1,354 and 1,078 points. People were interested.&lt;/p&gt;

&lt;h2&gt;
  
  
  What was inside the code?
&lt;/h2&gt;

&lt;p&gt;35+ tools across six categories, 73+ slash commands, and over 200 server-side feature gates (&lt;a href="https://ccunpacked.dev/" rel="noopener noreferrer"&gt;ccunpacked.dev&lt;/a&gt;, 2026). The community built a &lt;a href="https://ccunpacked.dev/" rel="noopener noreferrer"&gt;visual guide&lt;/a&gt; mapping out an 11-step agent loop from keypress to response.&lt;/p&gt;

&lt;p&gt;The main &lt;code&gt;print.ts&lt;/code&gt; file is 5,594 lines long. Inside it, a single function spans 3,167 lines at 12 levels of nesting (&lt;a href="https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/" rel="noopener noreferrer"&gt;alex000kim&lt;/a&gt;, 2026). Not great.&lt;/p&gt;

&lt;p&gt;There's an operational bug affecting 1,279 sessions that hit 50+ consecutive failures, wasting roughly 250,000 API calls per day globally. HN commenters said it was fixable with three lines.&lt;/p&gt;

&lt;p&gt;The tool taxonomy is more interesting than the code quality issues. File operations, bash execution, web browsing, agent orchestration, task management, cron jobs, worktree isolation. What looks like a coding assistant in the terminal is actually a full agent framework. Daemon mode. Unix domain socket communication between sessions. Remote control via mobile and browser.&lt;/p&gt;

&lt;p&gt;I've been watching Claude Code's behavior through Claudoscope session logs for months. The leaked architecture confirms patterns I'd noticed in the wild: tool calls cascading through orchestration layers, sessions spawning sub-agents, loops where it burns through tokens retrying failed operations over and over. Reading the source was like finally seeing the schematic for a machine I'd only heard running.&lt;/p&gt;

&lt;h2&gt;
  
  
  The features nobody was supposed to see
&lt;/h2&gt;

&lt;p&gt;The most discussed findings weren't about code quality. They were about where Anthropic is heading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;KAIROS&lt;/strong&gt; is a persistent autonomous agent mode. It runs on periodic &lt;code&gt;&amp;lt;tick&amp;gt;&lt;/code&gt; prompts, maintains daily append-only logs, subscribes to GitHub webhooks, and spawns background daemon workers. The source states it "becomes more autonomous when terminal unfocused." It includes a &lt;code&gt;/dream&lt;/code&gt; skill and five-minute cron refreshes. Claude Code that doesn't wait for you to type. That's what this is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Undercover mode&lt;/strong&gt; drew the sharpest reaction. The file &lt;code&gt;undercover.ts&lt;/code&gt; suppresses all signs of AI authorship when contributing to public or open-source repos. The instructions are blunt: "NEVER include the phrase 'Claude Code' or any mention that you are an AI" and remove "Co-Authored-By lines or any other attribution." It only runs for Anthropic employees (&lt;code&gt;USER_TYPE === 'ant'&lt;/code&gt;). The code says: "There is NO force-OFF."&lt;/p&gt;

&lt;p&gt;I keep coming back to this one. A company that's built its identity on AI safety and transparency had a mode specifically designed to hide AI involvement in open-source contributions. The file also prevents mention of internal model codenames like "Capybara" and "Tengu," which suggests unreleased models Anthropic hasn't publicly acknowledged.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anti-distillation&lt;/strong&gt; sends decoy tool definitions to poison training data if competitors scrape API traffic. A secondary mechanism uses server-side text summarization with cryptographic signatures between tool calls to obscure reasoning chains. As multiple HN commenters pointed out, the strategic value of this system "evaporated the moment the .map file hit the CDN."&lt;/p&gt;

&lt;p&gt;Other exposed systems: native client attestation (DRM-like cryptographic verification of legitimate Claude Code binaries), frustration detection via regex (pattern-matching profanity like "wtf" and "dumbass" instead of using the LLM itself, which is kind of funny), and Buddy, a virtual terminal pet that turned out to be the 2026 April Fools' feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  The DMCA overreaction
&lt;/h2&gt;

&lt;p&gt;Anthropic's response to the leak may end up being the bigger story. On March 31 they filed DMCA takedown notices targeting an entire fork network of &lt;a href="https://github.com/github/dmca/blob/master/2026/03/2026-03-31-anthropic.md" rel="noopener noreferrer"&gt;8,100+ repositories&lt;/a&gt; on GitHub. The notice said: "The entire repository is infringing."&lt;/p&gt;

&lt;p&gt;Many of those repos had nothing to do with the leak. One developer &lt;a href="https://news.ycombinator.com/item?id=47584540" rel="noopener noreferrer"&gt;noted on HN&lt;/a&gt; that their fork "had not been modified since May" and "did not contain a copy of the leaked code." Others called it "misguided" and "ridiculous." I mean, yeah.&lt;/p&gt;

&lt;p&gt;The legal questions get weird fast. If Claude Code was partly written by Claude itself (Anthropic says they use their own tools internally), does the AI-generated portion qualify for copyright protection? One commenter raised a sharper point: &lt;code&gt;undercover.ts&lt;/code&gt; explicitly hides AI authorship, which could undermine Anthropic's own copyright claims. False DMCA claims constitute perjury.&lt;/p&gt;

&lt;p&gt;Anthropic executives later said the mass takedowns were accidental and retracted most of the notices (&lt;a href="https://techcrunch.com/2026/04/01/anthropic-took-down-thousands-of-github-repos-trying-to-yank-its-leaked-source-code-a-move-the-company-says-was-an-accident/" rel="noopener noreferrer"&gt;TechCrunch&lt;/a&gt;, 2026). But by then the Streisand effect had done its work. Every takedown drew more attention to the code they were trying to hide.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are the actual security risks?
&lt;/h2&gt;

&lt;p&gt;No user data was exposed. But the leak did expose systems Anthropic relies on to protect its product.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System exposed&lt;/th&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Anti-distillation decoy tools&lt;/td&gt;
&lt;td&gt;Anyone scraping API traffic can now filter for fakes&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Native client attestation&lt;/td&gt;
&lt;td&gt;Cryptographic hash mechanism publicly documented&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security header feature flags&lt;/td&gt;
&lt;td&gt;Remote disabling of security headers revealed&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unreleased product roadmap&lt;/td&gt;
&lt;td&gt;KAIROS, UltraPlan, Coordinator Mode visible to competitors&lt;/td&gt;
&lt;td&gt;Medium-High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Internal model codenames&lt;/td&gt;
&lt;td&gt;"Capybara," "Tengu" disclosed&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operational bugs&lt;/td&gt;
&lt;td&gt;250K wasted API calls/day, trivially fixable&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The anti-distillation system is the clearest loss. Its entire value depended on competitors not knowing it existed.&lt;/p&gt;

&lt;p&gt;This connects to something I've written about before. When I &lt;a href="https://dev.to/blog/found-database-password-in-claude-code-session"&gt;found my database password sitting in a Claude Code session file&lt;/a&gt;, the issue wasn't that Claude Code was doing something malicious. The issue was that it operates with deep filesystem access and stores everything in unencrypted JSONL files that nobody checks. The source leak confirms what I suspected: there's limited internal safeguarding around what gets stored and transmitted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claw Code: 146K stars in 48 hours
&lt;/h2&gt;

&lt;p&gt;Within hours of the leak, a developer ported Claude Code's core architecture to Python and Rust from scratch. &lt;a href="https://github.com/ultraworkers/claw-code" rel="noopener noreferrer"&gt;Claw Code&lt;/a&gt; hit 146,000 GitHub stars and 101,000 forks in under 48 hours.&lt;/p&gt;

&lt;p&gt;It's a clean-room rewrite, not a fork of the leaked code. The repo disclaims any affiliation with Anthropic and says the exposed snapshot "is no longer part of the tracked repository state." The developer was later featured in a Wall Street Journal article as a power user who consumed "25 billion tokens" of AI coding tools per year.&lt;/p&gt;

&lt;p&gt;The project includes an interactive CLI, plugin system, MCP orchestration, streaming API support, and LSP integration. Rust (92.9%), Python (7.1%).&lt;/p&gt;

&lt;p&gt;We've seen this before. When Meta's LLaMA model weights leaked in 2023, they chased takedowns for a while, then gave up and went open. The community built derivatives no matter what legal said. 146K stars on Claw Code tells you what developers actually want. Whether Anthropic decides to offer an open alternative is almost beside the point now.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger picture
&lt;/h2&gt;

&lt;p&gt;This didn't happen in isolation. It capped a rough month for Anthropic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feb 16: Pentagon threatened Anthropic with punitive action&lt;/li&gt;
&lt;li&gt;Mar 5: Pentagon formally labeled Anthropic a "supply chain risk" (&lt;a href="https://www.wsj.com/politics/national-security/pentagon-formally-labels-anthropic-supply-chain-risk-escalating-conflict-ebdf0523" rel="noopener noreferrer"&gt;WSJ&lt;/a&gt;, 2026)&lt;/li&gt;
&lt;li&gt;Mar 9: Anthropic sued the Pentagon (&lt;a href="https://www.axios.com/2026/03/09/anthropic-sues-pentagon-supply-chain-risk-label" rel="noopener noreferrer"&gt;Axios&lt;/a&gt;, 2026)&lt;/li&gt;
&lt;li&gt;Mar 26: Federal judge blocked the Pentagon's effort (&lt;a href="https://www.cnn.com/2026/03/26/business/anthropic-pentagon-injunction-supply-chain-risk" rel="noopener noreferrer"&gt;CNN&lt;/a&gt;, 2026)&lt;/li&gt;
&lt;li&gt;Mar 31: Source code leaked via npm. DMCA takedowns hit 8,100+ repos&lt;/li&gt;
&lt;li&gt;Apr 1: TechCrunch runs &lt;a href="https://techcrunch.com/2026/03/31/anthropic-is-having-a-month/" rel="noopener noreferrer"&gt;"Anthropic is having a month"&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anthropic built its brand on responsible development and safety-first engineering. Then a source map shipped in an npm package and nobody caught it. The DMCA response hit thousands of uninvolved developers. And &lt;code&gt;undercover.ts&lt;/code&gt; was hiding AI authorship while the company publicly advocated for transparency.&lt;/p&gt;

&lt;p&gt;I still use Claude Code. I don't think it's a bad product. But the gap between the safety messaging and the operational reality is now documented in 1,700 TypeScript files. Anyone can read them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do now
&lt;/h2&gt;

&lt;p&gt;If you use Claude Code, there's nothing you need to patch or update. The leak was Anthropic's source code, not your data.&lt;/p&gt;

&lt;p&gt;What's worth paying attention to is how Anthropic responds. As of this writing, there's been no official statement on their newsroom, blog, or developer channels. The only Anthropic employee who commented publicly was Jared Sumner, and only to clarify the Bun bug wasn't the cause. Whether they address undercover mode, the DMCA overreach, or the anti-distillation system will say a lot about how they handle things going forward.&lt;/p&gt;

&lt;p&gt;And if you're eyeing Claw Code as an alternative, know what you're getting into. It's a clean-room rewrite with different internals, not a fork.&lt;/p&gt;

&lt;p&gt;Or maybe this is the push to try something else entirely. &lt;a href="https://forgecode.dev/" rel="noopener noreferrer"&gt;ForgeCode&lt;/a&gt; currently tops TermBench 2.0 and has been getting a lot of attention. I haven't switched yet, but I'd be lying if I said I wasn't curious.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What exactly was leaked in the Claude Code source code?
&lt;/h3&gt;

&lt;p&gt;The full TypeScript source, exposed via a JavaScript source map in npm package v2.1.88. It included 35+ tools, 73+ slash commands, 200+ feature gates, and unreleased features like KAIROS autonomous mode and undercover mode (&lt;a href="https://ccunpacked.dev/" rel="noopener noreferrer"&gt;ccunpacked.dev&lt;/a&gt;, 2026).&lt;/p&gt;

&lt;h3&gt;
  
  
  Why did Anthropic take down 8,100 GitHub repositories?
&lt;/h3&gt;

&lt;p&gt;They filed DMCA takedown notices targeting the entire fork network of the repo hosting the leaked code. Many repos contained no leaked material. Anthropic later called the mass takedown accidental and retracted most notices (&lt;a href="https://techcrunch.com/2026/04/01/anthropic-took-down-thousands-of-github-repos-trying-to-yank-its-leaked-source-code-a-move-the-company-says-was-an-accident/" rel="noopener noreferrer"&gt;TechCrunch&lt;/a&gt;, 2026).&lt;/p&gt;

&lt;h3&gt;
  
  
  Is my data at risk from the Claude Code leak?
&lt;/h3&gt;

&lt;p&gt;No. This was source code, not user data. That said, the source did reveal how session data is handled and that feature flags exist to disable security headers remotely.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Claw Code?
&lt;/h3&gt;

&lt;p&gt;Someone ported Claude Code's core architecture to Python and Rust from scratch within hours of the leak. It's a clean-room rewrite, not a fork. 146,000 stars and 101,000 forks in under 48 hours. Not affiliated with Anthropic (&lt;a href="https://github.com/ultraworkers/claw-code" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;).&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/" rel="noopener noreferrer"&gt;Claude Code Source Leak Analysis&lt;/a&gt; - alex000kim, March 31, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ccunpacked.dev/" rel="noopener noreferrer"&gt;Claude Code Unpacked Visual Guide&lt;/a&gt; - ccunpacked.dev, April 1, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/github/dmca/blob/master/2026/03/2026-03-31-anthropic.md" rel="noopener noreferrer"&gt;Anthropic DMCA Notice&lt;/a&gt; - GitHub DMCA Archive, March 31, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://news.ycombinator.com/item?id=47584540" rel="noopener noreferrer"&gt;HN Thread: Source Leak Disclosure&lt;/a&gt; - Hacker News, March 31, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://techcrunch.com/2026/04/01/anthropic-took-down-thousands-of-github-repos-trying-to-yank-its-leaked-source-code-a-move-the-company-says-was-an-accident/" rel="noopener noreferrer"&gt;Anthropic took down thousands of GitHub repos&lt;/a&gt; - TechCrunch, April 1, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://techcrunch.com/2026/03/31/anthropic-is-having-a-month/" rel="noopener noreferrer"&gt;Anthropic is having a month&lt;/a&gt; - TechCrunch, March 31, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ultraworkers/claw-code" rel="noopener noreferrer"&gt;Claw Code Repository&lt;/a&gt; - GitHub&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>security</category>
    </item>
    <item>
      <title>I found my database password in a Claude Code session file</title>
      <dc:creator>Liran Baba</dc:creator>
      <pubDate>Tue, 31 Mar 2026 13:00:00 +0000</pubDate>
      <link>https://forem.com/liran_baba/i-found-my-database-password-in-a-claude-code-session-file-2fe8</link>
      <guid>https://forem.com/liran_baba/i-found-my-database-password-in-a-claude-code-session-file-2fe8</guid>
      <description>&lt;p&gt;I use Claude Code for most of my programming work, and I have very little idea what it's actually doing under the hood.&lt;/p&gt;

&lt;p&gt;A few months ago I was poking around &lt;code&gt;~/.claude/projects/&lt;/code&gt; and opened a session JSONL file. Buried in the conversation, Claude Code had read a &lt;code&gt;.env&lt;/code&gt; file and echoed its contents back as a tool result. My database password, sitting in plaintext, in a file I never look at.&lt;/p&gt;

&lt;p&gt;That was the afternoon I stopped what I was working on and started building Claudoscope.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem isn't Claude Code. It's visibility.
&lt;/h2&gt;

&lt;p&gt;Claude Code doesn't have a cost breakdown per session. The Enterprise API doesn't surface spend data at all; only the admin dashboard does, and it's not granular enough. When we rolled it out across the org, nobody could answer basic questions: which sessions are expensive? Is the agent stuck in a loop somewhere? Is our CLAUDE.md actually doing anything useful or just eating context window?&lt;/p&gt;

&lt;p&gt;And the security angle was worse. Session files contain the full conversation, including anything the agent reads from disk. If it touches a file with credentials, those credentials now live in an unencrypted JSONL file indefinitely. Nobody was checking for that.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdhspjp6q2j12sjtbycae.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdhspjp6q2j12sjtbycae.png" alt="Claudoscope menu bar widget" width="542" height="1064"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So I built a flashlight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claudoscope&lt;/strong&gt; is a native macOS menu bar app. It watches your Claude Code session files locally, parses them, and gives you a dashboard. Nothing leaves your machine.&lt;/p&gt;

&lt;p&gt;The menu bar widget gives you a glance: today's sessions, tokens, cost, and any sessions that are currently running with a live cost number next to them. Click through to the full dashboard when you want the details.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Why did Tuesday cost $47?"
&lt;/h3&gt;

&lt;p&gt;That was the question I kept asking and couldn't answer. The analytics view breaks it down: cost by project, cost by model, daily trends. The cache tab shows whether your prompt cache is stable or busting on every request (cache busting is expensive and invisible without tracking). There's a what-if calculator that shows what your bill would look like if you moved Opus sessions to Sonnet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fteqwnxt8mhsh3dc1oyut.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fteqwnxt8mhsh3dc1oyut.png" alt="Claudoscope analytics dashboard" width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  "Is my CLAUDE.md any good?"
&lt;/h3&gt;

&lt;p&gt;I didn't plan on building a config linter. It started as a quick check for obvious problems in my own setup. Then I ran it on a colleague's CLAUDE.md and found it was over 4,000 tokens, roughly 10% of the context window eaten by instructions before the agent even started working. So I made it a rule.&lt;/p&gt;

&lt;p&gt;The linter now has 19 rules. It checks CLAUDE.md structure, skill metadata, deprecated commands, token budget estimates. It groups findings by rule rather than by file, so you see patterns. One rule (subprocess env scrub) has a one-click auto-fix.&lt;/p&gt;

&lt;p&gt;The first time I ran it on our team's configs, it flagged raw XML brackets in a skill's frontmatter that would break the system prompt parser. Nobody had noticed because the failure was silent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hxx616iula51fb2lpbm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hxx616iula51fb2lpbm.png" alt="Claudoscope health linter" width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Secret scanning
&lt;/h3&gt;

&lt;p&gt;This is probably the most useful feature and also the hardest one to get people excited about. Did the agent just leak your credentials? You'd never know unless something was watching.&lt;/p&gt;

&lt;p&gt;Claudoscope scans session files for leaked credentials: private keys, AWS access keys, auth headers, API tokens, passwords in connection strings. It uses regex matching, Shannon entropy analysis, and allowlists for placeholder values. The entropy check matters because without it you get a wall of false positives from example code and docs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2For6c3105nkk61j2h1q72.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2For6c3105nkk61j2h1q72.png" alt="Claudoscope realtime secret scanning" width="720" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When it finds something, a panel pops up on screen. Doesn't matter if the dashboard is open. It watches the tail of active session files and alerts you immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned from my own data
&lt;/h2&gt;

&lt;p&gt;Building this meant spending a lot of time inside Claude Code's JSONL format. A few things I didn't expect:&lt;/p&gt;

&lt;p&gt;Prompt cache reads are cheap ($0.30/MTok on Sonnet vs $3.00 uncached), so I assumed most of my input was cached. On some projects, 30-40% wasn't. The cache busts when session context shifts after compaction, and before I had a hit rate chart staring me in the face, I had no idea.&lt;/p&gt;

&lt;p&gt;I also figured my expensive sessions would be the big multi-hour ones. They weren't. The cost was in dozens of short sessions where Claude Code loaded context, did one thing, and exited. Each one paid full input with no cache. Fifty quick questions cost me more than the three-hour refactor.&lt;/p&gt;

&lt;p&gt;Most CLAUDE.md files across our team were 2,000-5,000 tokens. Context window you pay for on every message. A few people trimmed theirs after seeing the linter's token estimate.&lt;/p&gt;

&lt;p&gt;And one gotcha for anyone parsing these files themselves: the JSONL contains intermediate records with null &lt;code&gt;stop_reason&lt;/code&gt;, in-progress streaming responses. Sum all records naively and you double-count tokens. I shipped this bug and didn't catch it until cost estimates were 1.5-2x the actual Vertex bill. Not documented anywhere, as far as I can tell.&lt;/p&gt;

&lt;h2&gt;
  
  
  Under the hood
&lt;/h2&gt;

&lt;p&gt;It watches &lt;code&gt;~/.claude/projects/&lt;/code&gt; with macOS FSEvents (not polling). Session parsing runs on a Swift actor for thread safety. Cost estimation runs per-message, not per-session, because different messages in the same session can use different models. There's an LRU cache (20 sessions) so navigating between recent sessions feels instant.&lt;/p&gt;

&lt;p&gt;I built it in SwiftUI, macOS 14+, Apple Silicon only. I wanted it to feel like a Mac app. That means no Linux or Windows, and I'm fine with that tradeoff.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;Free, open source, macOS only (Apple Silicon). Homebrew:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew tap cordwainersmith/claudoscope
brew &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cask&lt;/span&gt; claudoscope
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or grab the DMG from &lt;a href="https://github.com/cordwainersmith/Claudoscope" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. It auto-updates. The cost estimation is most useful on Enterprise plans where per-session data isn't available, but session analytics and config linting work regardless of your plan.&lt;/p&gt;

&lt;p&gt;Go check your session files. You might not like what you find.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>claudecode</category>
      <category>security</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
