<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Liran Baba</title>
    <description>The latest articles on Forem by Liran Baba (@liran_baba).</description>
    <link>https://forem.com/liran_baba</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3853249%2F1fb2b801-c7ae-463d-b813-a600cdd7ca4f.png</url>
      <title>Forem: Liran Baba</title>
      <link>https://forem.com/liran_baba</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/liran_baba"/>
    <language>en</language>
    <item>
      <title>AI made your team code faster. Everything after is still broken.</title>
      <dc:creator>Liran Baba</dc:creator>
      <pubDate>Mon, 04 May 2026 16:01:37 +0000</pubDate>
      <link>https://forem.com/liran_baba/ai-made-your-team-code-faster-everything-after-is-still-broken-g6</link>
      <guid>https://forem.com/liran_baba/ai-made-your-team-code-faster-everything-after-is-still-broken-g6</guid>
      <description>&lt;p&gt;I sat with one of our teams a while back - multiple agents running in parallel, each on a different feature, output that used to take a sprint landing before lunch. I asked mid-session: did that bug from last week get fixed? The one a customer had flagged. Everyone paused. The agent had been asked to fix it. Someone thought it shipped. But had it reached prod? Was it still in staging? It might have gone out in one of the four releases that week and nobody tracked which one carried it. When that much is moving at once, the fix existed somewhere - they just had no way to know where.&lt;/p&gt;

&lt;p&gt;They opened four tabs. Checked a CI dashboard. Skimmed Slack looking for someone's deployment message. Twenty minutes gone. &lt;/p&gt;

&lt;h2&gt;
  
  
  Coding agents move fast - until binaries go live
&lt;/h2&gt;

&lt;p&gt;Coding agents shifted where the bottleneck is. Writing code was the hard part. Now it's everything after: releasing, knowing what actually shipped and understanding what's running in which environment.&lt;/p&gt;

&lt;p&gt;When you ship faster, the operational surface grows - more releases, more artifacts, more environments to track. The tooling most teams use was built for a slower world, one where humans managed each step, version numbers were meaningful, and an artifact repo was something your DevOps person configured once and left running.&lt;/p&gt;

&lt;p&gt;Here's where that breaks.&lt;/p&gt;

&lt;p&gt;Your coding agent has no idea what's actually deployed. "Is the new auth flow live on staging?" "Did the security patch reach prod?" It can't tell you - no access to runtime environments.  Instead of you hunting down answers, the agent tries to do the heavy lifting - scraping past commits, reaching into prod if you have the right access, and digging through issues. But because of its limited memory, it only ever sees a fraction of the bigger picture.&lt;/p&gt;

&lt;p&gt;Version numbers have also stopped meaning much. "Deploy v2.4.1" is nearly useless as a description of intent. What you actually want is "deploy the latest build that fixed the checkout bug" or "what's different between the version in staging and prod" - but connecting code changes to what shipped to what's currently running is still manual, and it gets harder the faster you move.&lt;/p&gt;

&lt;p&gt;There's also the setup cost that nobody names out loud. Distribution management, package manager configs, CI/CD integration - before you've shipped anything, you've burned real time on infrastructure. For teams with no dedicated DevOps person, it's often what quietly kills momentum.&lt;/p&gt;

&lt;p&gt;When all of this is still on you - tracking what built, what shipped, what's running where - it doesn't matter that the agent writes code fast. The last mile is still yours.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbocqzd838714u5z2diiy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbocqzd838714u5z2diiy.png" alt="AI made your team code faster. Releases, deploys, and tracking what's running where didn't keep up. Here's where the last mile breaks." width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Fly: Giving Binaries Agentic Wings
&lt;/h2&gt;

&lt;p&gt;We built Fly because we kept running into exactly this. Teams moving fast with AI, then stalling at the release and visibility layer where nothing is agentic and nothing talks to anything else.&lt;/p&gt;

&lt;p&gt;The artifact registry needs to be part of the workflow, not a separate system your agent has never heard of. That's where the ten minutes go.&lt;/p&gt;

&lt;p&gt;In Fly, every push creates a traceable release with its PR, commits, and change summary attached automatically. That context is available through MCP, so Cursor, Claude Code, Copilot - whatever you use - actually knows what's been built and what's deployed.&lt;/p&gt;

&lt;p&gt;You can ask "find the release that fixed the checkout bug" or "what changes are queued between prod and staging?" or "deploy John's latest changes to production" and get something useful back instead of a blank stare or a three-minute token flood that may or may not give the right answer. You can ask what’s running in any environment without leaving your coding session, and get answers from up-to-date semantic tracking across every runtime, fully agentless.&lt;/p&gt;

&lt;p&gt;Releases are identified by changes they contain, not just by a version number someone incremented in a pipeline. Setup is a few minutes: connect GitHub, run the Fly bash command, push once, and it picks up context from there.&lt;br&gt;
You can even ask Fly to slack you back "when Rachel pushed the new UI design to staging".&lt;/p&gt;

&lt;p&gt;We've been running this with teams for several months. The thing I hear most often is: "I didn’t realize how much time this was costing us until we had a better way." That's the right outcome - when the tooling becomes invisible in your workflow.&lt;/p&gt;

&lt;p&gt;If your team is shipping fast, but figuring out where finished code is actually running still costs you valuable minutes and multiple DMs, it’s time for a better way.&lt;br&gt;
Try this: &lt;a href="//jfrog.com/fly"&gt;jfrog.com/fly&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Disclosure: I’m AI Lead at JFrog. I use Fly day-to-day.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>devex</category>
    </item>
    <item>
      <title>Claude Code hooks: the half of Claude Code nobody uses</title>
      <dc:creator>Liran Baba</dc:creator>
      <pubDate>Thu, 23 Apr 2026 17:32:45 +0000</pubDate>
      <link>https://forem.com/liran_baba/claude-code-hooks-the-half-of-claude-code-nobody-uses-5570</link>
      <guid>https://forem.com/liran_baba/claude-code-hooks-the-half-of-claude-code-nobody-uses-5570</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4rsl98dbzb4i1qf8mm8r.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4rsl98dbzb4i1qf8mm8r.webp" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I was halfway through writing this post when I decided to fact-check myself. Opened &lt;code&gt;~/.claude/settings.json&lt;/code&gt;, expecting three or four hooks I'd forgotten about. There was one. A &lt;code&gt;Stop&lt;/code&gt; hook that plays a ding and says "your turn" when Claude finishes thinking. My hooks-to-skills ratio: 1 to 42.&lt;/p&gt;

&lt;p&gt;I'm not picking on myself. This is the median. I checked seven of my own project configs after that: zero hooks each. Skills got the awesome-lists. Hooks got a footnote. And the silence is costing people money in tokens, missed incidents in security, and a control surface that ships in the box and never gets wired up.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code has 25+ hook event types. The average user has configured zero or one. I checked seven of my own project configs: zero hooks. My user-level config: one Stop hook.&lt;/li&gt;
&lt;li&gt;Skills feel like adding capability, which is fun. Hooks feel like writing policy, which sounds like work. The work is the part that pays.&lt;/li&gt;
&lt;li&gt;One &lt;code&gt;PreToolUse&lt;/code&gt; hook that swaps Grep for LSP cuts navigation tokens by 73-91% in the kit's own benchmarks (&lt;a href="https://github.com/nesaminua/claude-code-lsp-enforcement-kit" rel="noopener noreferrer"&gt;nesaminua/claude-code-lsp-enforcement-kit&lt;/a&gt;, MIT).&lt;/li&gt;
&lt;li&gt;The enterprise hook playbook (SOC2 audit, SIEM integration, supply-chain scanning, org-wide token budgets) does not exist publicly yet. If your platform team writes one now, you're early.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why skills get all the attention and hooks get none
&lt;/h2&gt;

&lt;p&gt;Skills fit a familiar mental model: drop a markdown file, write a description, the agent picks it up. They feel like &lt;em&gt;adding capability&lt;/em&gt;. Hooks are different. You write a script that fires at an execution boundary and returns an exit code or JSON to block, modify, or augment what happens next. They feel like &lt;em&gt;writing policy&lt;/em&gt;. I saw a &lt;a href="https://reddit.com/r/AI_Agents/comments/1smmjvl/" rel="noopener noreferrer"&gt;reddit thread&lt;/a&gt; that nailed this for me: skills change what the model can do, hooks change when it can do it.&lt;/p&gt;

&lt;p&gt;That asymmetry shows up in adoption. Skills make Claude more capable; hooks make it more predictable. If you're a solo dev optimizing for capability, you reach for skills. If you're a team lead trying to keep five engineers from doing five different unsafe things, you reach for hooks. Most early adopters were solo, so the awesome-lists filled up with skills first.&lt;/p&gt;

&lt;p&gt;There's also a distribution problem. A skill is a markdown file. You can paste it in a Slack message. A hook is a &lt;code&gt;settings.json&lt;/code&gt; entry (or a bundled file in a plugin or skill) pointing to a shell script that touches your filesystem. Sharing it requires trust, a setup ritual, and someone willing to chmod +x a stranger's bash. That's a real barrier.&lt;/p&gt;

&lt;p&gt;Look at the curated lists on GitHub. Hook-only repos trail the mixed lists (skills, slash commands, MCP, hooks) by a wide margin, and broader awesome-claude-code lists treat hooks as a footnote. Even the &lt;a href="https://reddit.com/r/PromptEngineering/comments/1slpy2a/" rel="noopener noreferrer"&gt;post on how the creator of Claude Code uses it&lt;/a&gt; mentions hooks only in passing, and zero of the 26 comments noticed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;My audit:&lt;/strong&gt; 42 user-level skills installed. 1 hook (the Stop notification mentioned above). Across 7 project configs: 0 hooks. The one that stings is my reddit-mcp repo, which gives Claude posting and deletion permissions on my Reddit account. Zero hooks there too. If I'm typical, the median ratio is something like 40 skills to 1 hook.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My &lt;a href="https://liranbaba.dev/blog/claude-code-source-leak/" rel="noopener noreferrer"&gt;deep dive on Claude Code's leaked source&lt;/a&gt; showed the harness has a &lt;code&gt;hooks/&lt;/code&gt; directory with 104 files. That's a lot of internal scaffolding for something most users typically ignore, at least in our org.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hook event reference (Anthropic's official taxonomy)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk37phota0kqp0of50tsk.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk37phota0kqp0of50tsk.webp" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are at least 25 hook events documented in &lt;a href="https://code.claude.com/docs/en/hooks" rel="noopener noreferrer"&gt;Anthropic's hooks documentation&lt;/a&gt;. Most third-party tutorials cover four. Here's the actual catalog, grouped by cadence, so you can see the full surface.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cadence&lt;/th&gt;
&lt;th&gt;Events&lt;/th&gt;
&lt;th&gt;Effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Once per session&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SessionStart&lt;/code&gt;, &lt;code&gt;SessionEnd&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SessionStart&lt;/code&gt; can inject context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Once per turn&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;UserPromptSubmit&lt;/code&gt;, &lt;code&gt;Stop&lt;/code&gt;, &lt;code&gt;StopFailure&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;UserPromptSubmit&lt;/code&gt; and &lt;code&gt;Stop&lt;/code&gt; can block or modify&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per tool call&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;PreToolUse&lt;/code&gt;, &lt;code&gt;PostToolUse&lt;/code&gt;, &lt;code&gt;PostToolUseFailure&lt;/code&gt;, &lt;code&gt;PermissionRequest&lt;/code&gt;, &lt;code&gt;PermissionDenied&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;PreToolUse&lt;/code&gt; can block or modify&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Async / lifecycle&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;WorktreeCreate&lt;/code&gt;, &lt;code&gt;WorktreeRemove&lt;/code&gt;, &lt;code&gt;Notification&lt;/code&gt;, &lt;code&gt;ConfigChange&lt;/code&gt;, &lt;code&gt;InstructionsLoaded&lt;/code&gt;, &lt;code&gt;CwdChanged&lt;/code&gt;, &lt;code&gt;FileChanged&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Observation only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent team&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SubagentStart&lt;/code&gt;, &lt;code&gt;SubagentStop&lt;/code&gt;, &lt;code&gt;TeammateIdle&lt;/code&gt;, &lt;code&gt;TaskCreated&lt;/code&gt;, &lt;code&gt;TaskCompleted&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Observation only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compaction&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;PreCompact&lt;/code&gt;, &lt;code&gt;PostCompact&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;PreCompact&lt;/code&gt; can inject context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;Elicitation&lt;/code&gt;, &lt;code&gt;ElicitationResult&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;Elicitation&lt;/code&gt; can respond&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Configuration sits at four levels: user-wide (&lt;code&gt;~/.claude/settings.json&lt;/code&gt;), project-shared (&lt;code&gt;.claude/settings.json&lt;/code&gt;, commit it), project-private (&lt;code&gt;.claude/settings.local.json&lt;/code&gt;, gitignored), and managed policy (org admin only, individual devs can't disable). Most public configs use the first two. Managed policy is where org-wide control lives, and almost nobody ships templates for it.&lt;/p&gt;

&lt;p&gt;Handler types: &lt;code&gt;command&lt;/code&gt;, &lt;code&gt;http&lt;/code&gt;, &lt;code&gt;prompt&lt;/code&gt;, &lt;code&gt;agent&lt;/code&gt;. Most public examples use &lt;code&gt;command&lt;/code&gt;. The &lt;code&gt;http&lt;/code&gt; handler is the door to centralized policy, where one webhook enforces the same rule across every developer in the org. I haven't seen it used in any public repo yet.&lt;/p&gt;

&lt;p&gt;Exit codes are easy to get wrong. &lt;code&gt;0&lt;/code&gt; means success. &lt;code&gt;2&lt;/code&gt; is a blocking error, but behavior depends on the event: &lt;code&gt;PreToolUse&lt;/code&gt; blocks the tool call, &lt;code&gt;Stop&lt;/code&gt; prevents session end, other events treat it as non-fatal. Any other code is non-blocking. The model can't see why a hook fired unless the hook writes to stderr; this is the most common reason hooks feel mysterious in practice.&lt;/p&gt;

&lt;p&gt;The hook surface is wide. You can block, modify, observe, or augment almost any event in the agent's lifecycle. The shape of what's possible is "almost anything." The shape of what's actually configured on most machines is "nothing."&lt;/p&gt;

&lt;h2&gt;
  
  
  10 hooks people are actually running
&lt;/h2&gt;

&lt;p&gt;These are the hooks I'd put in front of my team. None are "block writes to .env" or "format on save." Each one is published somewhere (a repo or a Reddit thread) and does something non-obvious. Ranked roughly from most surprising to most foundational.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. LSP-over-Grep enforcement.&lt;/strong&gt; &lt;em&gt;PreToolUse on Grep, Glob, Bash, Read.&lt;/em&gt; &lt;a href="https://github.com/nesaminua/claude-code-lsp-enforcement-kit" rel="noopener noreferrer"&gt;nesaminua/claude-code-lsp-enforcement-kit&lt;/a&gt; (MIT). Blocks Grep calls containing code symbols and forces the agent to use LSP &lt;code&gt;find_definition&lt;/code&gt; and &lt;code&gt;find_references&lt;/code&gt; instead. Documented per-call savings: definition lookup drops from ~6,500 to ~580 tokens. Real workweek aggregate: 320k to 85k navigation tokens, a 73% reduction. Works with cclsp/Serena, supports TypeScript and 13+ other languages. It punishes a default behavior. The agent can still grep, but the cost is paid in friction, not silently in tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Knowledge-graph compile of the project.&lt;/strong&gt; &lt;em&gt;Skill + hook combo.&lt;/em&gt; &lt;a href="https://github.com/safishamsi/graphify" rel="noopener noreferrer"&gt;safishamsi/graphify&lt;/a&gt;. Karpathy-style: instead of re-reading raw files every session, compile the project into a structured wiki once, then query the wiki via skill. Hook installs the skill and registers &lt;code&gt;/graphify&lt;/code&gt; as the entry point. Claimed 71.5x token reduction per query on a mixed corpus. Treats context as a build artifact, not a runtime cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The 4-hook workflow enforcement stack.&lt;/strong&gt; &lt;em&gt;SessionStart + PreToolUse on Edit + Stop + PostToolUse on git commit.&lt;/em&gt; From &lt;code&gt;tacit7&lt;/code&gt; in the &lt;a href="https://reddit.com/r/AI_Agents/comments/1smmjvl/" rel="noopener noreferrer"&gt;hooks vs skills thread&lt;/a&gt;. SessionStart tells the agent to read the workflow skill. PreToolUse on Edit refuses if no task is registered. Stop refuses if the task isn't annotated. PostToolUse on &lt;code&gt;git commit&lt;/code&gt; logs the commit to an external app. Four hooks turn a probabilistic agent into a procedurally-compliant teammate, end to end.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. "Fail twice, stop and ask."&lt;/strong&gt; &lt;em&gt;PostToolUseFailure + Notification.&lt;/em&gt; Another one from the same thread. If the same tool fails twice with similar errors, the hook halts the session and pings the human. The most common Claude Code failure mode is the agent looping on a bad call. This rule catches it in maybe 20 lines of code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Per-file typecheck after every edit.&lt;/strong&gt; &lt;em&gt;PostToolUse on Write/Edit.&lt;/em&gt; From &lt;code&gt;DevMoses&lt;/code&gt; in the &lt;a href="https://reddit.com/r/ClaudeAI/comments/1s1ipep/" rel="noopener noreferrer"&gt;"5 levels of Claude Code"&lt;/a&gt; post. Runs &lt;code&gt;tsc --noEmit&lt;/code&gt; (or equivalent) on the single file Claude just edited, instead of flooding the agent with 200+ project-wide errors. Inverts the default. The agent gets a tight feedback loop on its own work without drowning in unrelated noise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Dynamic permission control via hook-managed policy.&lt;/strong&gt; &lt;em&gt;PreToolUse + PermissionRequest.&lt;/em&gt; Also from the same thread. Hooks flip permissions on and off at runtime based on context (project, session source, current task). Claude Code's permission model is mostly static. This hook makes it conditional, which matters for orgs with role-based access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Session-conclusion guard.&lt;/strong&gt; &lt;em&gt;Stop + SessionEnd.&lt;/em&gt; &lt;a href="https://github.com/connerohnesorge/conclaude" rel="noopener noreferrer"&gt;connerohnesorge/conclaude&lt;/a&gt;. Refuses to end a session if there's uncommitted state, in-progress work, or unmerged checkpoints. Stops the "I closed the terminal and lost work" failure mode at the harness level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. Filesystem offload of large tool outputs.&lt;/strong&gt; &lt;em&gt;PostToolUse on Read/Bash/WebFetch.&lt;/em&gt; &lt;a href="https://github.com/sheeki03/Few-Word" rel="noopener noreferrer"&gt;sheeki03/Few-Word&lt;/a&gt;. When a tool returns more than N tokens, write the result to disk and return a short summary plus a path. Treats the filesystem as an extension of context. The agent can re-read the slice it actually needs instead of choking on a 50KB file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;9. Cache-fix patches injected via hook.&lt;/strong&gt; &lt;em&gt;SessionStart.&lt;/em&gt; &lt;a href="https://github.com/Rangizingo/cc-cache-fix/tree/main" rel="noopener noreferrer"&gt;Rangizingo/cc-cache-fix&lt;/a&gt;. Patches a documented Claude Code bug where the &lt;code&gt;db8&lt;/code&gt; filter strips &lt;code&gt;deferred_tools_delta&lt;/code&gt; records, breaking the prompt cache on resumed sessions. The author's analysis claims it wasted ~250,000 API calls per day globally before being noticed. The hook applies the patch at session start. Hooks as a deployment mechanism for community fixes. No need to wait for Anthropic to ship a release. (I'd qualify the 250k figure as analysis-based, not independently confirmed by Anthropic.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10. Lessons-learned hooks ("encode the mistake").&lt;/strong&gt; &lt;em&gt;Pattern, not a single repo.&lt;/em&gt; From &lt;code&gt;Aggressive-Sweet828&lt;/code&gt; in the hooks vs skills thread. Every time the agent makes a mistake you don't want repeated, turn it into a hook. Over time, your hooks become your team's quality bar, written in code instead of whispered in code review. Reframes hooks as institutional memory, not just guardrails.&lt;/p&gt;

&lt;p&gt;Of the ten, six are now sitting in my own &lt;code&gt;settings.json&lt;/code&gt;: the LSP enforcement kit (#1), the graphify knowledge-graph compile (#2), the fail-twice loop guard (#4), the filesystem offload for large outputs (#8), the cache-fix patch (#9), and the lessons-learned pattern (#10). The LSP kit and the fail-twice guard are the two I have the most to say about so far. The other four are too new for me to have a real story yet, and I'll update this section as they earn one.&lt;/p&gt;

&lt;p&gt;LSP enforcement kit (#1): install was a &lt;code&gt;git clone&lt;/code&gt; and a single &lt;code&gt;bash install.sh&lt;/code&gt;. The installer is idempotent and merges into &lt;code&gt;~/.claude/settings.json&lt;/code&gt; without touching what's already there. First session into the portfolio repo, the hook fired on the second tool call and refused a Grep for &lt;code&gt;BlogPostLayout&lt;/code&gt;. The agent reached for &lt;code&gt;find_definition&lt;/code&gt; instead and landed on the right file. The thing I didn't expect was how often I write prompts that assume Grep ("find where we use X"). The agent now has to translate those into LSP, which takes a beat. I'm keeping it on this repo and waiting to see what the weekly token total actually does.&lt;/p&gt;

&lt;p&gt;Fail-twice loop guard (#4): hand-rolled in about 20 lines of bash because the pattern is small enough I didn't want a dependency. It hasn't fired yet, probably because I haven't kicked off anything ambitious enough since installing it. The version I wrote compares the last two &lt;code&gt;PostToolUseFailure&lt;/code&gt; events for the same tool name and a similar error substring, and pings me via the same Stop-hook ding I already had. If it ever fires, I'll update this section with what it caught.&lt;/p&gt;

&lt;p&gt;The LSP entry is the most honest data point here. It's widely cited, and the kit ships its own reproducible benchmarks. Here's what the per-call savings look like on the operations the agent does dozens of times a day.&lt;/p&gt;

&lt;p&gt;The headline 80% savings number on the &lt;a href="https://reddit.com/r/AI_Agents/comments/1slligv/" rel="noopener noreferrer"&gt;original Reddit post&lt;/a&gt; is anecdotal. The 91% per-call and 73% workweek-aggregate numbers come from the kit's own benchmarks, which are reproducible. I'd treat the kit numbers as the reliable ones and the Reddit headline as directionally right.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skills vs hooks: a decision table
&lt;/h2&gt;

&lt;p&gt;Skills describe &lt;em&gt;what to try&lt;/em&gt;. Hooks define &lt;em&gt;what must happen&lt;/em&gt;. Pair them: a skill describes the workflow, a hook enforces the precondition.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Skills&lt;/th&gt;
&lt;th&gt;Hooks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mental model&lt;/td&gt;
&lt;td&gt;Add capability (request)&lt;/td&gt;
&lt;td&gt;Define policy (enforcement)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Distribution&lt;/td&gt;
&lt;td&gt;Markdown file, frontmatter&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;settings.json&lt;/code&gt; entry pointing to script&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Determinism&lt;/td&gt;
&lt;td&gt;Probabilistic (model decides if/when)&lt;/td&gt;
&lt;td&gt;Deterministic (fires every event match)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token cost&lt;/td&gt;
&lt;td&gt;Loaded on demand, ~free when inactive&lt;/td&gt;
&lt;td&gt;Often &lt;em&gt;saves&lt;/em&gt; tokens (LSP swap, output offload)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Model writes about using it in transcripts&lt;/td&gt;
&lt;td&gt;Side effects + exit code; model often can't see why a hook fired&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Reusable workflows, domain expertise&lt;/td&gt;
&lt;td&gt;Guardrails, audit, cost control, integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure mode&lt;/td&gt;
&lt;td&gt;Model forgets to use it&lt;/td&gt;
&lt;td&gt;Hook breaks the session if poorly written&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sharing friction&lt;/td&gt;
&lt;td&gt;Low (a &lt;code&gt;.md&lt;/code&gt; file)&lt;/td&gt;
&lt;td&gt;Higher (script + permission + JSON)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A concrete pair: a "deploy" skill describes the deployment runbook (build, test, push, monitor). A &lt;code&gt;PreToolUse&lt;/code&gt; hook on the deploy command verifies the test suite passed in the last 5 minutes and you're on a release branch, and refuses otherwise. The skill teaches. The hook insists.&lt;/p&gt;

&lt;p&gt;The insight that did the most for my own thinking: &lt;strong&gt;guardrails belong in hooks because blocks need to be deterministic, not described.&lt;/strong&gt; A skill that says "don't push to main without tests" is a polite request the model can ignore. A &lt;code&gt;PreToolUse&lt;/code&gt; hook that returns exit code 2 with &lt;code&gt;"decision": "block"&lt;/code&gt; cannot be ignored.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://liranbaba.dev/blog/forgecode-vs-claude-code/" rel="noopener noreferrer"&gt;My comparison of ForgeCode and Claude Code&lt;/a&gt; called out hooks as one of Claude Code's real differentiators. Re-reading it now, I underweighted them. ForgeCode being faster doesn't matter if your team needs deterministic policy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The enterprise patterns nobody is writing about
&lt;/h2&gt;

&lt;p&gt;Anthropic shipped the primitives this past quarter: &lt;code&gt;allowManagedHooksOnly&lt;/code&gt;, &lt;code&gt;allowedHttpHookUrls&lt;/code&gt;, &lt;code&gt;httpHookAllowedEnvVars&lt;/code&gt;, plus a drop-in &lt;code&gt;managed-settings.d/&lt;/code&gt; directory for stacking policy from multiple teams. What's missing is the layer above: published end-to-end SIEM, SOC2, and audit playbooks built on those primitives. I can't find a single public repo shipping an org-wide audit template I'd actually deploy.&lt;/p&gt;

&lt;p&gt;The vendors closest to filling that gap are the ones with skin in the AI-supply-chain game. (Disclosure: JFrog is my employer; not paid to link this.) &lt;a href="https://jfrog.com/blog/supply-chain-attackers-are-coming-for-your-agents/" rel="noopener noreferrer"&gt;Supply Chain Attackers Are Coming for Your Agents&lt;/a&gt; walks the Shai-Hulud npm worm, the postmark-mcp exfiltration, and the LiteLLM compromise as cases where a &lt;code&gt;PreToolUse&lt;/code&gt; hook on the install boundary would have caught the payload. &lt;a href="https://jfrog.com/blog/jfrog-ai-catalog-evolves-to-detect-shadow-ai-govern-mcps/" rel="noopener noreferrer"&gt;JFrog AI Catalog Evolves to Detect Shadow AI and Govern MCPs&lt;/a&gt; covers the upstream gateway angle that pairs with client-side hooks. Neither is a hook playbook, but they're closer than anything else I've found.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security and policy enforcement.&lt;/strong&gt; A &lt;code&gt;PreToolUse&lt;/code&gt; hook on Write blocks anything outside approved paths (scope to &lt;code&gt;Write|Edit|MultiEdit&lt;/code&gt;). A &lt;code&gt;UserPromptSubmit&lt;/code&gt; hook scrubs known credentials (AWS access key prefix, GitHub PAT, JFrog tokens) before the prompt leaves the machine, returning JSON &lt;code&gt;decision: "block"&lt;/code&gt; on a match. The session file where I found my database password (the one that started &lt;a href="https://dev.to/"&gt;Claudoscope&lt;/a&gt;) was created because Claude Code read a &lt;code&gt;.env&lt;/code&gt; and echoed the contents back. A &lt;code&gt;PreToolUse&lt;/code&gt; hook on Read with a &lt;code&gt;.env&lt;/code&gt; matcher would have refused that read. Claudoscope catches the credential after it lands in the JSONL; the hook prevents it from landing. The full story is in &lt;a href="https://liranbaba.dev/blog/found-database-password-in-claude-code-session/" rel="noopener noreferrer"&gt;how I found a database password in a session file&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compliance and audit.&lt;/strong&gt; A &lt;code&gt;PostToolUse&lt;/code&gt; hook with handler type &lt;code&gt;http&lt;/code&gt; sends a structured event to a SIEM (Splunk, Datadog, Elastic) on every tool call: session ID, user, tool name, sanitized input, timestamp, project. &lt;code&gt;SessionStart&lt;/code&gt; and &lt;code&gt;SessionEnd&lt;/code&gt; book-end the audit log. Combine with managed policy and &lt;code&gt;allowManagedHooksOnly: true&lt;/code&gt; so individual devs can't disable the audit hook locally. This is the org-wide control surface the docs describe but no one's shipping templates for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost control.&lt;/strong&gt; Per-developer token budget: a &lt;code&gt;PreToolUse&lt;/code&gt; hook logs token estimates per tool call, totals them per day in a small SQLite file, and denies expensive calls once the budget is hit. Same pattern for model routing: rewrite the model selection or refuse a dispatch if the request is trivial enough that Opus is overkill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DevSecOps and supply chain.&lt;/strong&gt; This is where my JFrog day job comes in. &lt;code&gt;PreToolUse&lt;/code&gt; on Bash intercepts &lt;code&gt;npm install&lt;/code&gt;, &lt;code&gt;pip install&lt;/code&gt;, etc., and runs the package through a vulnerability scanner before allowing the install. &lt;code&gt;PreToolUse&lt;/code&gt; on Write runs the staged change through JFrog Advanced Security and refuses the write if SAST finds an issue.&lt;/p&gt;

&lt;p&gt;The conversation that keeps coming up on our side: whether agent-initiated package installs count as a developer action or a CI action under existing supply-chain policy. We don't have a clean answer yet. A &lt;code&gt;PreToolUse&lt;/code&gt; hook that routes installs through Xray would collapse the distinction; the same policy applies wherever the install happens.&lt;/p&gt;

&lt;p&gt;Swap in your scanner of choice. Supply-chain controls don't have to live in CI anymore. They can live at the agent's tool-call boundary, closer to where the risk is introduced.&lt;/p&gt;

&lt;p&gt;If you're a platform team standing up Claude Code at scale, you're filling that gap yourself, which is fine but probably not what you signed up for.&lt;/p&gt;

&lt;h2&gt;
  
  
  The footguns
&lt;/h2&gt;

&lt;p&gt;Hooks are powerful because they run as the user. That's also the danger. Four real failure modes I've watched people hit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infinite loop.&lt;/strong&gt; A hook on &lt;code&gt;PostToolUse&lt;/code&gt; triggers another tool call that triggers the same hook. Fix: add a sentinel (env var or marker file) and short-circuit on repeat invocation. This bites the first time you write a &lt;code&gt;PostToolUse&lt;/code&gt; that does anything substantive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The hook breaks the session and Claude has no idea why.&lt;/strong&gt; From the same thread: "if a hook fails the model has no idea why and can't self-correct." Mitigation: return useful text on stderr with exit code 2 so the model gets context, even when blocking. The default behavior of a silent block is the worst possible UX.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Permission deadlock.&lt;/strong&gt; Also from the same thread: "I've already seen it deadlock a session when the hook permissions were set too tight." Always test with &lt;code&gt;--debug&lt;/code&gt; first. Always.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shell startup pollution.&lt;/strong&gt; Anthropic's docs explicitly warn: shell profiles printing text on startup interfere with JSON parsing. A single &lt;code&gt;echo&lt;/code&gt; line in your &lt;code&gt;.bashrc&lt;/code&gt; will silently break every JSON-output hook on your machine. This one is hilarious until it happens to you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is a Claude Code hook?
&lt;/h3&gt;

&lt;p&gt;A user-defined command, HTTP endpoint, prompt, or agent invocation that fires at a specific lifecycle point: a tool call, session start, prompt submission, and 20+ other events. Defined in &lt;code&gt;settings.json&lt;/code&gt; or bundled with a plugin or skill. Returns control via exit codes and structured JSON. Hooks can block tool calls, modify their input, inject context, or just observe. (&lt;a href="https://code.claude.com/docs/en/hooks" rel="noopener noreferrer"&gt;Anthropic docs&lt;/a&gt;, 2026)&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between Claude Code hooks and skills?
&lt;/h3&gt;

&lt;p&gt;Skills are markdown workflows the model decides whether to use. Hooks are deterministic execution-boundary callbacks: they fire every time the matched event happens, regardless of what the model decides. Use skills to teach a workflow; use hooks when the precondition is non-negotiable. They pair well together, since the skill describes the procedure and the hook makes sure the agent actually followed it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Claude Code hooks reduce token usage?
&lt;/h3&gt;

&lt;p&gt;Yes, significantly. The most-cited example: a &lt;code&gt;PreToolUse&lt;/code&gt; hook that swaps Grep for LSP-based code navigation reduces per-call tokens from ~6,500 to ~580 (a 91% drop) for definition lookups, with a documented 73% real-world weekly aggregate reduction (&lt;a href="https://github.com/nesaminua/claude-code-lsp-enforcement-kit" rel="noopener noreferrer"&gt;nesaminua/claude-code-lsp-enforcement-kit&lt;/a&gt;, MIT). Other patterns (sandboxed tool output, knowledge-graph compilation) report 71.5x to 98% reductions in their respective scopes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I landed
&lt;/h2&gt;

&lt;p&gt;I came into this thinking I'd find a few clever hooks worth sharing. I came out wanting to spend a weekend writing the org-wide policy template that doesn't exist.&lt;/p&gt;

&lt;p&gt;If you've been shipping skills and ignoring hooks, you've taken the easier half. Skills are the part Claude can ignore. Hooks are the part it can't. For a solo dev that distinction is small. For a team it's most of the value.&lt;/p&gt;

&lt;p&gt;Practical first move, if you're still reading: write the &lt;code&gt;UserPromptSubmit&lt;/code&gt; hook that scrubs your most likely credentials before they leave the machine. Maybe 30 lines of Python or bash. It will catch a real incident inside a month. I'd bet on it.&lt;/p&gt;

&lt;p&gt;After that, audit your own &lt;code&gt;.claude/settings.json&lt;/code&gt;. If the count is zero or one, you've got a lot of room. (And if you also want to see what your sessions are actually doing while you sort out which hooks to write, that's what I built &lt;a href="https://dev.to/"&gt;Claudoscope&lt;/a&gt; for.)&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://liranbaba.dev/blog/claude-code-hooks-the-half-nobody-uses/" rel="noopener noreferrer"&gt;liranbaba.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devtools</category>
      <category>ai</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>ForgeCode vs Claude Code: which AI coding agent actually wins?</title>
      <dc:creator>Liran Baba</dc:creator>
      <pubDate>Thu, 09 Apr 2026 11:24:56 +0000</pubDate>
      <link>https://forem.com/liran_baba/forgecode-vs-claude-code-which-ai-coding-agent-actually-wins-36c</link>
      <guid>https://forem.com/liran_baba/forgecode-vs-claude-code-which-ai-coding-agent-actually-wins-36c</guid>
      <description>&lt;p&gt;I've been using Claude Code for months. I like it. I genuinely don't get the Twitter hate. But there's one thing that's been driving me crazy: speed. I'll ask it to rename a variable across three files and it sits there thinking for 40 seconds. A simple test fix on a small repo, and I'm watching a spinner for two minutes. It's not a deal-breaker, but it's the kind of friction that builds up over a day.&lt;/p&gt;

&lt;p&gt;We recently rolled out Claude Code across our entire engineering org. We're not ditching Cursor, just giving devs the option to pick whatever tool works for them. And the feedback I kept hearing from people, unprompted: it's slow. Not everyone, not every task. But enough devs brought it up that it clearly wasn't just me being impatient.&lt;/p&gt;

&lt;p&gt;So I started looking at alternatives. OpenAI has &lt;a href="https://github.com/openai/codex" rel="noopener noreferrer"&gt;Codex CLI&lt;/a&gt; but I haven't tried the harness yet, just the models. The &lt;a href="https://www.tbench.ai/leaderboard/terminal-bench/2.0" rel="noopener noreferrer"&gt;TermBench 2.0 leaderboard&lt;/a&gt; is what caught my eye. ForgeCode at #1 with 81.8%. Claude Code at 58%, ranked #39. I installed ForgeCode that same day.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ForgeCode with Opus 4.6 was noticeably faster than Claude Code on the same tasks. Not marginal, real.&lt;/li&gt;
&lt;li&gt;ForgeCode topped &lt;a href="https://www.tbench.ai/" rel="noopener noreferrer"&gt;TermBench 2.0&lt;/a&gt; at 81.8%, but that's its own benchmark. On the independent &lt;a href="https://www.swebench.com" rel="noopener noreferrer"&gt;SWE-bench&lt;/a&gt;, the gap shrinks to 2.4 points.&lt;/li&gt;
&lt;li&gt;GPT 5.4 through ForgeCode was unstable for me. A research task on a small repo took 15 minutes.&lt;/li&gt;
&lt;li&gt;I'm double-dipping now. Claude Code is still primary, but the latency gains on ForgeCode are too real to ignore.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What is ForgeCode (and why the benchmark confusion exists)?
&lt;/h2&gt;

&lt;p&gt;ForgeCode is not an AI model. It's a model-agnostic agent harness, open source under Apache 2.0, written in Rust, that wraps any LLM through OpenRouter or direct API keys. It launched in late January 2025 and hit &lt;a href="https://github.com/antinomyhq/forgecode" rel="noopener noreferrer"&gt;v2.8.0 on GitHub&lt;/a&gt; by April 2026 with over 6,000 stars.&lt;/p&gt;

&lt;p&gt;ForgeCode ships three built-in agents. &lt;code&gt;forge&lt;/code&gt; writes and edits code. &lt;code&gt;sage&lt;/code&gt; does read-only research and can't modify files. &lt;code&gt;muse&lt;/code&gt; generates plans and writes them to a &lt;code&gt;plans/&lt;/code&gt; directory. It's Zsh-native, using a &lt;code&gt;:&lt;/code&gt; prefix so you never leave your shell.&lt;/p&gt;

&lt;p&gt;Here's the thing that matters for evaluating the benchmark: TermBench 2.0 is ForgeCode's own benchmark, hosted at tbench.ai. The organization submitting entries is ForgeCode itself. That doesn't make the results wrong. But it's not a neutral third party.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does the benchmark actually hold up?
&lt;/h2&gt;

&lt;p&gt;On &lt;a href="https://www.swebench.com" rel="noopener noreferrer"&gt;SWE-bench Verified&lt;/a&gt;, an independent benchmark from Princeton and UChicago, ForgeCode + Claude 4 scored 72.7% compared to Claude 3.7 Sonnet's 70.3%. A 2.4-point gap, not the 24-point gap TermBench implies. That context changes the whole picture.&lt;/p&gt;

&lt;p&gt;The TermBench 2.0 numbers, self-reported by ForgeCode on tbench.ai:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ForgeCode + GPT 5.4: 81.8%&lt;/li&gt;
&lt;li&gt;ForgeCode + Claude Opus 4.6: 81.8%&lt;/li&gt;
&lt;li&gt;Claude Code + Claude Opus 4.6: 58.0% (rank #39)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The SWE-bench Verified numbers, independent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ForgeCode + Claude 4: 72.7%&lt;/li&gt;
&lt;li&gt;Claude 3.7 Sonnet (extended thinking): 70.3%&lt;/li&gt;
&lt;li&gt;Claude 4.5 Opus: 76.8%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So how did ForgeCode reach 81.8%? Their blog documents four specific harness changes. They reordered JSON schema fields, putting &lt;code&gt;required&lt;/code&gt; before &lt;code&gt;properties&lt;/code&gt; to reduce GPT 5.4 tool-call errors. They flattened nested schemas. They added explicit truncation reminders when files are partially read. And they added a mandatory verification pass where a reviewer skill checks task completion before the agent can stop.&lt;/p&gt;

&lt;p&gt;These are real engineering improvements. They're also benchmark-specific optimizations. The r/ClaudeCode community called it "benchmaxxed," which is both funny and kind of fair.&lt;/p&gt;

&lt;p&gt;I've been eyeing this leaderboard for a while. The numbers are what pushed me to actually try ForgeCode. With Opus 4.6, it was noticeably faster than Claude Code. That part wasn't hype.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.swebench.com" rel="noopener noreferrer"&gt;SWE-bench scores&lt;/a&gt; went from 1.96% in late 2023 to 76.8% by early 2026. Everything's getting better fast. The question is whether a 2-point edge on an independent benchmark justifies switching your entire workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it's actually like to use ForgeCode
&lt;/h2&gt;

&lt;p&gt;Install is a one-liner: &lt;code&gt;curl -fsSL https://forgecode.dev/cli | sh&lt;/code&gt;. Then &lt;code&gt;forge provider login&lt;/code&gt; to set up your API keys and you're in. About the same friction as Claude Code. The Zsh plugin is a nice touch, you type &lt;code&gt;:&lt;/code&gt; followed by your prompt and it runs inline without switching contexts.&lt;/p&gt;

&lt;p&gt;First thing I tried: pointed it at my portfolio repo (Astro 6, maybe 30 files) with Opus 4.6 as the model. I asked it to add a post counter to the blog index page and wire it into the nav component. Claude Code takes about 90 seconds on that kind of task on this repo. ForgeCode did it in under 30. Correct output, clean diff, no hallucinated imports. The speed difference was immediately obvious.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0smwwo4a6qc8ihow0i7q.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0smwwo4a6qc8ihow0i7q.webp" alt=" " width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I ran the same kind of test a few more times. A multi-file rename, adding an external link tooltip component, restructuring a layout. ForgeCode with Opus 4.6 was consistently faster. Not by a little. I could feel it in my workflow.&lt;/p&gt;

&lt;p&gt;Plan mode was the other thing that stood out. ForgeCode's &lt;code&gt;muse&lt;/code&gt; agent writes plans to a &lt;code&gt;plans/&lt;/code&gt; directory, and the output felt more detailed and verbose than Claude Code's plan mode. Whether that's good or bad depends on what you want. I kind of liked having the longer breakdown.&lt;/p&gt;

&lt;p&gt;Then I tried GPT 5.4 through ForgeCode, and it fell apart. I asked it to research the architecture of a small repo. Fifteen minutes. Kept going unstable, tool calls failing, the agent retrying and spinning. I killed it. So "ForgeCode is fast" needs a qualifier: ForgeCode with Opus 4.6 is fast. ForgeCode with GPT 5.4 was borderline unusable for me.&lt;/p&gt;

&lt;p&gt;But I'll give them this: the ForgeCode team explicitly says they've hired zero paid influencers. The low social media presence is intentional. Kind of respect that. In an industry where half the "honest reviews" have affiliate links in the description, that's almost suspiciously refreshing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why ForgeCode is actually faster
&lt;/h2&gt;

&lt;p&gt;Part of it is just the Rust binary (Claude Code is TypeScript, so startup and memory are heavier). But that's not the whole story.&lt;/p&gt;

&lt;p&gt;ForgeCode has a context engine that indexes function signatures and module boundaries instead of dumping raw files into the context window. The agent pulls only what it needs. Some estimates say this cuts context size by about 90%, which means faster responses and cheaper models that don't lose the plot halfway through a task. That's the real reason the same model (Opus 4.6) responds faster through ForgeCode than through Claude Code.&lt;/p&gt;

&lt;p&gt;There's also a &lt;code&gt;--sandbox&lt;/code&gt; flag that creates an isolated git worktree and branch, so you can try something risky without touching your main tree and only merge back what works.&lt;/p&gt;

&lt;p&gt;What Claude Code has built &lt;em&gt;around&lt;/em&gt; the core loop, parallel agent execution, hooks, scheduled cloud tasks, auto-memory, none of that exists in ForgeCode yet. The harness is fast. Everything around it is thin. ForgeCode is a Lambo with no cup holder. Fast as hell, but you're holding your coffee between your knees.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I missed when I wasn't using Claude Code
&lt;/h2&gt;

&lt;p&gt;I didn't appreciate this until I spent a few days away from Claude Code: the stuff around the agent matters more than the agent itself.&lt;/p&gt;

&lt;p&gt;With Claude Code, I have a CLAUDE.md in every project. My team shares the same project instructions. I have hooks that fire on file changes, so I can run secret scanning, linting, whatever I want on every edit. Auto-memory means I don't re-explain my codebase every session. And checkpoints mean every file edit gets snapshotted, so if the agent breaks something three steps back, I hit &lt;code&gt;/rewind&lt;/code&gt; and roll back without touching git.&lt;/p&gt;

&lt;p&gt;ForgeCode has AGENTS.md (similar idea to CLAUDE.md) and MCP support, so the basics are covered. But no hooks, no checkpoints, no auto-memory, no IDE extensions, no JetBrains plugin. The model-agnostic part is great. The ecosystem is still thin.&lt;/p&gt;

&lt;p&gt;For reference, here's the head-to-head:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;ForgeCode&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model choice&lt;/td&gt;
&lt;td&gt;Any (300+)&lt;/td&gt;
&lt;td&gt;Claude only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open source&lt;/td&gt;
&lt;td&gt;Yes (Apache 2.0)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project config&lt;/td&gt;
&lt;td&gt;AGENTS.md&lt;/td&gt;
&lt;td&gt;CLAUDE.md (hierarchical)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP support&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (extensive)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hooks&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (6 types)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduled tasks&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (cloud + local)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sub-agents&lt;/td&gt;
&lt;td&gt;Yes (forge/sage/muse)&lt;/td&gt;
&lt;td&gt;Yes (parallel)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plan mode&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (Shift+Tab)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VS Code&lt;/td&gt;
&lt;td&gt;No extension&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JetBrains&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto memory&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Checkpoints / rewind&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Where I landed
&lt;/h2&gt;

&lt;p&gt;I'm double-dipping. Claude Code is still my primary tool, but I keep ForgeCode open for tasks where the latency kills me. Sometimes I'll drop into Cursor for something visual. Three tools is kind of ridiculous, but the latency gains on ForgeCode are real enough that I can't just ignore them.&lt;/p&gt;

&lt;p&gt;Claude Code is where my project config lives, where my hooks fire, where my MCP connections run. That's my home base and it's not changing. But when I need something fast and self-contained, a quick refactor, a file rename across a module, something where I don't need the full ecosystem, I'll run it through ForgeCode with Opus 4.6 and it's done before Claude Code would've finished reading the context.&lt;/p&gt;

&lt;p&gt;As of April 2026, ForgeCode is faster than Claude Code when running the same model (Opus 4.6), but Claude Code has the deeper ecosystem with hooks, MCP, auto-memory, and IDE integrations. Neither wins across the board. Pick the one that matches how you work and be ready to use both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is ForgeCode's TermBench #1 score legitimate?
&lt;/h3&gt;

&lt;p&gt;TermBench is ForgeCode's own benchmark. On &lt;a href="https://www.swebench.com" rel="noopener noreferrer"&gt;SWE-bench Verified&lt;/a&gt;, an independent benchmark from Princeton, ForgeCode + Claude 4 scored 72.7% compared to Claude 3.7 Sonnet's 70.3%. Solid, but not the 24-point gap TermBench suggests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can ForgeCode use my existing Claude or ChatGPT subscription?
&lt;/h3&gt;

&lt;p&gt;No. You need API keys, not a subscription login. Separate billing from whatever you pay for Claude Pro or ChatGPT Plus.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does ForgeCode burn more tokens than Claude Code?
&lt;/h3&gt;

&lt;p&gt;Nobody's published hard numbers. ForgeCode's multi-agent setup (forge/sage/muse spawning sub-agents) almost certainly burns more tokens per session. I noticed it anecdotally but didn't measure. Track your own spend if you try it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is ForgeCode safe for proprietary code?
&lt;/h3&gt;

&lt;p&gt;The harness is open source, but default telemetry collects git user emails, scans SSH directories, and sends conversation data externally. &lt;a href="https://github.com/antinomyhq/forgecode/issues/1318" rel="noopener noreferrer"&gt;GitHub issue #1318&lt;/a&gt; raised data transparency concerns. The team addressed it in March 2025: set &lt;code&gt;FORGE_TRACKER=false&lt;/code&gt; to disable all tracking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is ForgeCode free?
&lt;/h3&gt;

&lt;p&gt;The code is free and open source (Apache 2.0). The hosted service was &lt;a href="https://reddit.com/r/cursor/comments/1maq1ex" rel="noopener noreferrer"&gt;originally unlimited&lt;/a&gt;, but switched to a tiered model in mid-2025 with daily request caps on the free tier.&lt;/p&gt;




&lt;p&gt;ForgeCode's benchmark lead exists on a test it runs itself. On independent benchmarks, it's comparable. The speed with Opus 4.6 is real. The GPT 5.4 experience was rough.&lt;/p&gt;

&lt;p&gt;I didn't expect to end up running two coding agents. But here I am. If ForgeCode ships hooks and the ecosystem catches up, that could change. For now, I'm using both, and it's working.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/antinomyhq/forgecode" rel="noopener noreferrer"&gt;ForgeCode GitHub Repository&lt;/a&gt; - GitHub, April 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.tbench.ai/leaderboard/terminal-bench/2.0" rel="noopener noreferrer"&gt;TermBench 2.0 Leaderboard&lt;/a&gt; - tbench.ai, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.swebench.com" rel="noopener noreferrer"&gt;SWE-bench Verified Leaderboard&lt;/a&gt; - Princeton/UChicago, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://code.claude.com/docs/en/" rel="noopener noreferrer"&gt;Claude Code Documentation&lt;/a&gt; - Anthropic, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.anthropic.com/news/claude-3-7-sonnet" rel="noopener noreferrer"&gt;Anthropic Claude 3.7 Sonnet Announcement&lt;/a&gt; - Anthropic, February 2025&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://liranbaba.dev/blog/forgecode-vs-claude-code/" rel="noopener noreferrer"&gt;liranbaba.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devtools</category>
      <category>ai</category>
      <category>claudecode</category>
      <category>forgecode</category>
    </item>
    <item>
      <title>Cursor 3 shipped parallel agents, but is any of it new?</title>
      <dc:creator>Liran Baba</dc:creator>
      <pubDate>Sun, 05 Apr 2026 15:06:27 +0000</pubDate>
      <link>https://forem.com/liran_baba/cursor-3-shipped-parallel-agents-but-is-any-of-it-new-2dd1</link>
      <guid>https://forem.com/liran_baba/cursor-3-shipped-parallel-agents-but-is-any-of-it-new-2dd1</guid>
      <description>&lt;p&gt;Cursor 3 shipped on April 2. The demos look great: eight AI agents running in parallel, each in its own Git worktree, building different parts of your project at the same time. The &lt;a href="https://news.ycombinator.com/item?id=47618084" rel="noopener noreferrer"&gt;Hacker News thread&lt;/a&gt; lit up. Product Hunt gave it the #3 spot for the day.&lt;/p&gt;

&lt;p&gt;Then I read the comments. One user reported spending $2,000 in two days on cloud agents. Another switched from $1,800/month on Cursor to roughly $200/month on Claude Code and Codex. A third said they had "zero interest" in forced agent swarms and were moving to VS Code with Claude Code instead.&lt;/p&gt;

&lt;p&gt;The coverage so far has been mostly feature recaps reprinting the press release. Nobody's asking the obvious questions: is parallel agent execution actually new? What does it really cost? And what happens when your agents need to share context?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Here's the Thing&lt;/strong&gt;&lt;br&gt;
Cursor 2 already supported parallel execution via worktree.json configuration. What Cursor 3 actually shipped is a UI layer (Agents Window sidebar, drag-drop tabs) on top of the same Git worktree primitives. The cost model is the real concern: early testers reported $2,000 bills in two days, and Cursor's pricing page doesn't explain why. The unsolved technical problem is context sharing between local and cloud agents, which the docs hand-wave as "summarized and reduced."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Cursor 3 actually shipped
&lt;/h2&gt;

&lt;p&gt;Cursor 3 lets you run up to 8 AI agents in parallel across isolated Git worktrees (&lt;a href="https://cursor.com/blog/cursor-3" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, 2026). Agents run locally via Composer 2 or in cloud isolation VMs. You can watch them all from a new sidebar called the Agents Window.&lt;/p&gt;

&lt;p&gt;That's the pitch, anyway.&lt;/p&gt;

&lt;p&gt;Cursor 2 already supported parallel agent execution through worktree.json configuration. The &lt;code&gt;/worktree&lt;/code&gt; command isn't new functionality. It's new UI. The Agents Window gives you visibility into what your agents are doing, and that part is genuinely useful. But calling this an architectural pivot is a stretch.&lt;/p&gt;

&lt;p&gt;The other additions: &lt;code&gt;/best-of-n&lt;/code&gt; runs the same prompt across multiple models side by side (Composer 2 vs. Claude vs. GPT). Design Mode lets you annotate UI elements and describe changes in plain English. The MCP Marketplace adds plugin support for hundreds of tools.&lt;/p&gt;

&lt;p&gt;Under the hood, &lt;code&gt;/worktree&lt;/code&gt; runs &lt;code&gt;git worktree add&lt;/code&gt; to create an isolated working directory on a new branch, then spawns an agent process scoped to that directory. Each agent gets its own filesystem view, so file edits don't collide mid-run. When the agent finishes, you review the diff and merge. This is the same thing you'd do manually with &lt;code&gt;git worktree add&lt;/code&gt; and a second terminal. Cursor 3 wraps it in a sidebar.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost problem nobody is talking about
&lt;/h2&gt;

&lt;p&gt;Early adopters reported spending $2,000+ in two days running Cursor 3's cloud agents (&lt;a href="https://news.ycombinator.com/item?id=47618084" rel="noopener noreferrer"&gt;Hacker News&lt;/a&gt;, 2026). That's not a typo. Two thousand dollars. Two days.&lt;/p&gt;

&lt;p&gt;Cursor's pricing page lists four tiers: Free, Pro at $20, Pro+ at $60, and Ultra at $200 per month (&lt;a href="https://cursor.com/pricing" rel="noopener noreferrer"&gt;cursor.com/pricing&lt;/a&gt;, 2026). Those numbers look reasonable until you start running cloud agents. The pricing page doesn't mention per-minute VM charges or explain how cloud agent costs are metered. The resource costs for cloud agents? Absent from the page entirely.&lt;/p&gt;

&lt;p&gt;HN user dirtbag__dad reported spending "$2k a week with premium models" before switching to Claude Code Max at "1/10th the price." Another commenter, verelo, switched from $1,800/month on Cursor to roughly $200/month on Claude and Codex, calling it "WAY better value for money."&lt;/p&gt;

&lt;p&gt;Same story every time. Listed price and actual spend have almost nothing in common. When your pricing page says $200/month but users regularly spend ten times that, the issue isn't pricing. It's that nobody can predict what anything costs before the bill shows up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code isn't immune either
&lt;/h3&gt;

&lt;p&gt;I should be fair here. Anthropic's flat-rate plans sound predictable, but they have their own version of this.&lt;/p&gt;

&lt;p&gt;In late March 2026, Claude Code Max plan users reported exhausting their quotas in under an hour. The same quota that previously lasted eight hours (&lt;a href="https://www.theregister.com/2026/03/31/anthropic_claude_code_limits/" rel="noopener noreferrer"&gt;The Register&lt;/a&gt;, 2026). The story pulled 324 points on Hacker News. BBC covered it a day later.&lt;/p&gt;

&lt;p&gt;Anthropic acknowledged the problem on Reddit: "people are hitting usage limits in Claude Code way faster than expected." A March promotion that doubled limits ended on March 28. There were reports of prompt cache bugs inflating token usage by 10-20x. And Anthropic doesn't publicly specify exact usage caps for any plan.&lt;/p&gt;

&lt;p&gt;So people started building tools just to figure out their own limits. API proxy interceptors. One developer &lt;a href="https://www.claudecodecamp.com/p/i-tried-to-reverse-engineer-claude-code-s-usage-limits" rel="noopener noreferrer"&gt;tried to reverse-engineer the utilization headers&lt;/a&gt; that Anthropic sends on every API response, because Claude Code doesn't surface them to you.&lt;/p&gt;

&lt;p&gt;I &lt;a href="https://liranbaba.dev/blog/found-database-password-in-claude-code-session/" rel="noopener noreferrer"&gt;built Claudoscope&lt;/a&gt; partly for this reason. If the tool won't tell you what it costs, build something that will.&lt;/p&gt;

&lt;p&gt;Both tools have cost transparency problems. They're just structured differently. Cursor's is per-token opacity: you don't know what cloud agents will cost until the bill arrives. Anthropic's is undisclosed caps on plans marketed as generous. Neither side has figured this out yet, which is kind of remarkable given how much both charge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The context sharing problem
&lt;/h2&gt;

&lt;p&gt;This is the technical gap that nobody's writing about, and it's the one that actually matters for how well parallel agents work in practice.&lt;/p&gt;

&lt;p&gt;Each worktree agent runs in its own isolated branch. That's the point: isolation prevents file conflicts. But it also means Agent A doesn't know what Agent B is doing. If you're building an API endpoint in one worktree and the frontend that calls it in another, those agents are working from the same base commit. Neither sees the other's in-progress changes.&lt;/p&gt;

&lt;p&gt;Cursor's docs say local and cloud agent contexts are "summarized and reduced" before sharing. That's doing a lot of work as a sentence. How much of a 100k-line codebase survives summarization? What's the token budget for the summary? Is it a full AST-aware summary or just file path lists? The docs don't say.&lt;/p&gt;

&lt;p&gt;There's also the committed-vs-dirty question. Are cloud agents working from the latest committed state on the branch, or from your local uncommitted edits? If committed: you have to commit before spawning cloud agents, which means half-finished code landing in your Git history. If uncommitted: they need filesystem sync between local and cloud, which introduces latency and consistency issues. The docs are silent on this too.&lt;/p&gt;

&lt;p&gt;I've hit a version of this problem with Claude Code's worktree parallelism. Two agents building against the same API contract will sometimes diverge on field names or response shapes because neither agent sees the other's work until merge time. The fix is manual: define the contract first, commit it, then parallelize. That works, but it means true parallelism requires upfront planning that eats into the time savings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://liranbaba.dev/blog/claude-code-source-leak/" rel="noopener noreferrer"&gt;The Claude Code source leak&lt;/a&gt; exposed how their agent orchestration handles this internally: spawning sub-agents, tool call cascading through orchestration layers, sessions that retry failed operations in loops. Context sharing between agents is an unsolved problem across the entire category, not just Cursor.&lt;/p&gt;

&lt;h2&gt;
  
  
  What parallel agents actually solve (and when they don't)
&lt;/h2&gt;

&lt;p&gt;Parallel agents deliver real speedups for the right kind of work. Building a full-stack feature with decoupled components? Four agents in parallel (UI, API, database, tests) can cut wall-clock time from eight hours to two (&lt;a href="https://cursor.com/docs/configuration/worktrees" rel="noopener noreferrer"&gt;Cursor docs&lt;/a&gt;, 2026). That's a genuine 4x on paper.&lt;/p&gt;

&lt;p&gt;I use Claude Code's worktree-based parallelism for similar workflows. Spin up multiple agents, each in an isolated branch, merge when they're done. The UX is rougher: no Agents Window, no drag-drop tabs, no visual status at a glance. But the core capability is the same, and the cost is flat.&lt;/p&gt;

&lt;p&gt;Here's where it falls apart. When Agent B depends on Agent A's output, you can't parallelize. That's most real work. For tasks under 30 minutes, the orchestration overhead eats the speedup. Solo devs on small projects get almost nothing from running eight agents simultaneously. And the context sharing gap I described above means agents working on related components will diverge unless you've done the upfront contract work.&lt;/p&gt;




&lt;p&gt;Cursor 3 is a polished UI layer on existing capabilities, positioned as an architectural breakthrough. The parallel agents are real but not new. The cost model is real but not transparent.&lt;/p&gt;

&lt;p&gt;If you're already in Claude Code, I don't see a reason to switch. If you're evaluating for the first time, try both. Run each for a week on real work, not demos. Track what you actually spend. Then decide.&lt;/p&gt;

&lt;p&gt;Or skip both and try &lt;a href="https://forgecode.dev/" rel="noopener noreferrer"&gt;ForgeCode&lt;/a&gt;. It's open source, terminal-based, and topped TermBench 2.0 at 81.8%. You bring your own API keys and pick your model. I haven't used it yet, but I'm giving it a weekend. Their blog post about hitting #1 is titled "benchmarks don't matter," which I kind of respect.&lt;/p&gt;

&lt;p&gt;That's really all I've got. Track your costs. The rest will sort itself out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How much does Cursor 3 actually cost per month?
&lt;/h3&gt;

&lt;p&gt;Plans start at $20/month but real-world spend with cloud agents ranges from $200 to $1,800+ per month based on Hacker News community reports (&lt;a href="https://news.ycombinator.com/item?id=47618084" rel="noopener noreferrer"&gt;HN&lt;/a&gt;, 2026). Cloud agent resource costs aren't disclosed on the pricing page. Track your actual spend for a full week before committing to a plan.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can you run Cursor 3 agents locally without cloud costs?
&lt;/h3&gt;

&lt;p&gt;Yes, local agents run Composer 2 on-device with no per-use charges. Cloud agents are where the parallel execution actually matters, though, and those costs aren't disclosed anywhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Cursor 3 better than Claude Code for parallel tasks?
&lt;/h3&gt;

&lt;p&gt;Claude Code supports parallel execution via worktrees at a flat $100-$200/month rate. Cursor 3 offers better visual orchestration through the Agents Window but with unpredictable costs. Pick based on what matters more to you: UI visibility or cost predictability.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cursor.com/blog/cursor-3" rel="noopener noreferrer"&gt;Cursor 3 Announcement&lt;/a&gt; - Cursor, April 2, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cursor.com/pricing" rel="noopener noreferrer"&gt;Cursor Pricing&lt;/a&gt; - cursor.com, April 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cursor.com/docs/configuration/worktrees" rel="noopener noreferrer"&gt;Cursor Parallel Agents Docs&lt;/a&gt; - Cursor docs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://news.ycombinator.com/item?id=47618084" rel="noopener noreferrer"&gt;HN: Cursor 3 Discussion&lt;/a&gt; - Hacker News, April 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.theregister.com/2026/03/31/anthropic_claude_code_limits/" rel="noopener noreferrer"&gt;Claude Code users hitting usage limits&lt;/a&gt; - The Register, March 31, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.claudecodecamp.com/p/i-tried-to-reverse-engineer-claude-code-s-usage-limits" rel="noopener noreferrer"&gt;Reverse Engineering Claude Code Limits&lt;/a&gt; - Claude Code Camp, April 1, 2026&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://liranbaba.dev/blog/cursor-3-parallel-agents/" rel="noopener noreferrer"&gt;liranbaba.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devtools</category>
      <category>ai</category>
    </item>
    <item>
      <title>Undercover mode, decoy tools, and a 3,167-line function: inside Claude Code's leaked source</title>
      <dc:creator>Liran Baba</dc:creator>
      <pubDate>Thu, 02 Apr 2026 20:34:48 +0000</pubDate>
      <link>https://forem.com/liran_baba/undercover-mode-decoy-tools-and-a-3167-line-function-inside-claude-codes-leaked-source-2159</link>
      <guid>https://forem.com/liran_baba/undercover-mode-decoy-tools-and-a-3167-line-function-inside-claude-codes-leaked-source-2159</guid>
      <description>&lt;p&gt;On March 31, a single &lt;code&gt;.map&lt;/code&gt; file shipped inside an npm package and exposed the complete internals of Claude Code. The &lt;a href="https://news.ycombinator.com/item?id=47584540" rel="noopener noreferrer"&gt;Hacker News thread&lt;/a&gt; hit 2,060 points. Anthropic filed DMCA takedowns against 8,100+ GitHub repos. And I spent most of the afternoon reading TypeScript I wasn't supposed to see.&lt;/p&gt;

&lt;p&gt;I use Claude Code every day. I built &lt;a href="https://claudoscope.com/" rel="noopener noreferrer"&gt;Claudoscope&lt;/a&gt; because I wanted to understand what it was actually doing in my terminal. So when the source dropped, I went through it. Some of it confirmed things I'd suspected. Some of it genuinely surprised me.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A JavaScript source map in Claude Code v2.1.88 exposed ~1,700 TypeScript source files (&lt;a href="https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/" rel="noopener noreferrer"&gt;alex000kim&lt;/a&gt;, 2026)&lt;/li&gt;
&lt;li&gt;Unreleased features include KAIROS autonomous mode, anti-distillation decoy tools, and "undercover mode" that hides AI authorship&lt;/li&gt;
&lt;li&gt;Anthropic's DMCA takedown hit 8,100+ repos, many containing no leaked code&lt;/li&gt;
&lt;li&gt;A clean-room rewrite called Claw Code gained 146,000 GitHub stars in under 48 hours&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What happened
&lt;/h2&gt;

&lt;p&gt;Security researcher Chaofan Shou &lt;a href="https://x.com/shoucccc/status/2038894956459290963" rel="noopener noreferrer"&gt;disclosed on X&lt;/a&gt; that Anthropic had shipped a JavaScript source map file inside Claude Code version 2.1.88 on npm. Source maps are debugging artifacts. They contain the original, readable TypeScript source before minification. They're not supposed to ship to production. This one did.&lt;/p&gt;

&lt;p&gt;Early speculation blamed a known Bun bug (&lt;a href="https://github.com/oven-sh/bun/issues/28001" rel="noopener noreferrer"&gt;oven-sh/bun#28001&lt;/a&gt;) where &lt;code&gt;bun serve&lt;/code&gt; sometimes exposes source maps in production. But that bug affects web apps hosted by Bun, not packages bundled with Bun and run locally. Claude Code uses Bun as a bundler and local runtime, not as a web server. Jared Sumner, Bun's creator and now an Anthropic employee, confirmed Claude Code doesn't use &lt;code&gt;bun serve&lt;/code&gt;, ruling this out. His comment was, as far as anyone can tell, the only public response from an Anthropic employee about the leak. The actual cause of the source map shipping in the npm package remains unexplained.&lt;/p&gt;

&lt;p&gt;About 1,700 source files were exposed, spread across utils (564 files), components (389), commands (189), tools (184), services (130), hooks (104), ink (96), and bridge (31) directories. The &lt;code&gt;.map&lt;/code&gt; file sat on the npm CDN for anyone to download. When Anthropic responded, they deprecated the package version rather than unpublishing it, so the file stayed somewhat accessible even after the response.&lt;/p&gt;

&lt;p&gt;The HN thread generated 1,013 comments. Two follow-up analysis posts scored 1,354 and 1,078 points. People were interested.&lt;/p&gt;

&lt;h2&gt;
  
  
  What was inside the code?
&lt;/h2&gt;

&lt;p&gt;35+ tools across six categories, 73+ slash commands, and over 200 server-side feature gates (&lt;a href="https://ccunpacked.dev/" rel="noopener noreferrer"&gt;ccunpacked.dev&lt;/a&gt;, 2026). The community built a &lt;a href="https://ccunpacked.dev/" rel="noopener noreferrer"&gt;visual guide&lt;/a&gt; mapping out an 11-step agent loop from keypress to response.&lt;/p&gt;

&lt;p&gt;The main &lt;code&gt;print.ts&lt;/code&gt; file is 5,594 lines long. Inside it, a single function spans 3,167 lines at 12 levels of nesting (&lt;a href="https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/" rel="noopener noreferrer"&gt;alex000kim&lt;/a&gt;, 2026). Not great.&lt;/p&gt;

&lt;p&gt;There's an operational bug affecting 1,279 sessions that hit 50+ consecutive failures, wasting roughly 250,000 API calls per day globally. HN commenters said it was fixable with three lines.&lt;/p&gt;

&lt;p&gt;The tool taxonomy is more interesting than the code quality issues. File operations, bash execution, web browsing, agent orchestration, task management, cron jobs, worktree isolation. What looks like a coding assistant in the terminal is actually a full agent framework. Daemon mode. Unix domain socket communication between sessions. Remote control via mobile and browser.&lt;/p&gt;

&lt;p&gt;I've been watching Claude Code's behavior through Claudoscope session logs for months. The leaked architecture confirms patterns I'd noticed in the wild: tool calls cascading through orchestration layers, sessions spawning sub-agents, loops where it burns through tokens retrying failed operations over and over. Reading the source was like finally seeing the schematic for a machine I'd only heard running.&lt;/p&gt;

&lt;h2&gt;
  
  
  The features nobody was supposed to see
&lt;/h2&gt;

&lt;p&gt;The most discussed findings weren't about code quality. They were about where Anthropic is heading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;KAIROS&lt;/strong&gt; is a persistent autonomous agent mode. It runs on periodic &lt;code&gt;&amp;lt;tick&amp;gt;&lt;/code&gt; prompts, maintains daily append-only logs, subscribes to GitHub webhooks, and spawns background daemon workers. The source states it "becomes more autonomous when terminal unfocused." It includes a &lt;code&gt;/dream&lt;/code&gt; skill and five-minute cron refreshes. Claude Code that doesn't wait for you to type. That's what this is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Undercover mode&lt;/strong&gt; drew the sharpest reaction. The file &lt;code&gt;undercover.ts&lt;/code&gt; suppresses all signs of AI authorship when contributing to public or open-source repos. The instructions are blunt: "NEVER include the phrase 'Claude Code' or any mention that you are an AI" and remove "Co-Authored-By lines or any other attribution." It only runs for Anthropic employees (&lt;code&gt;USER_TYPE === 'ant'&lt;/code&gt;). The code says: "There is NO force-OFF."&lt;/p&gt;

&lt;p&gt;I keep coming back to this one. A company that's built its identity on AI safety and transparency had a mode specifically designed to hide AI involvement in open-source contributions. The file also prevents mention of internal model codenames like "Capybara" and "Tengu," which suggests unreleased models Anthropic hasn't publicly acknowledged.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anti-distillation&lt;/strong&gt; sends decoy tool definitions to poison training data if competitors scrape API traffic. A secondary mechanism uses server-side text summarization with cryptographic signatures between tool calls to obscure reasoning chains. As multiple HN commenters pointed out, the strategic value of this system "evaporated the moment the .map file hit the CDN."&lt;/p&gt;

&lt;p&gt;Other exposed systems: native client attestation (DRM-like cryptographic verification of legitimate Claude Code binaries), frustration detection via regex (pattern-matching profanity like "wtf" and "dumbass" instead of using the LLM itself, which is kind of funny), and Buddy, a virtual terminal pet that turned out to be the 2026 April Fools' feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  The DMCA overreaction
&lt;/h2&gt;

&lt;p&gt;Anthropic's response to the leak may end up being the bigger story. On March 31 they filed DMCA takedown notices targeting an entire fork network of &lt;a href="https://github.com/github/dmca/blob/master/2026/03/2026-03-31-anthropic.md" rel="noopener noreferrer"&gt;8,100+ repositories&lt;/a&gt; on GitHub. The notice said: "The entire repository is infringing."&lt;/p&gt;

&lt;p&gt;Many of those repos had nothing to do with the leak. One developer &lt;a href="https://news.ycombinator.com/item?id=47584540" rel="noopener noreferrer"&gt;noted on HN&lt;/a&gt; that their fork "had not been modified since May" and "did not contain a copy of the leaked code." Others called it "misguided" and "ridiculous." I mean, yeah.&lt;/p&gt;

&lt;p&gt;The legal questions get weird fast. If Claude Code was partly written by Claude itself (Anthropic says they use their own tools internally), does the AI-generated portion qualify for copyright protection? One commenter raised a sharper point: &lt;code&gt;undercover.ts&lt;/code&gt; explicitly hides AI authorship, which could undermine Anthropic's own copyright claims. False DMCA claims constitute perjury.&lt;/p&gt;

&lt;p&gt;Anthropic executives later said the mass takedowns were accidental and retracted most of the notices (&lt;a href="https://techcrunch.com/2026/04/01/anthropic-took-down-thousands-of-github-repos-trying-to-yank-its-leaked-source-code-a-move-the-company-says-was-an-accident/" rel="noopener noreferrer"&gt;TechCrunch&lt;/a&gt;, 2026). But by then the Streisand effect had done its work. Every takedown drew more attention to the code they were trying to hide.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are the actual security risks?
&lt;/h2&gt;

&lt;p&gt;No user data was exposed. But the leak did expose systems Anthropic relies on to protect its product.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System exposed&lt;/th&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Anti-distillation decoy tools&lt;/td&gt;
&lt;td&gt;Anyone scraping API traffic can now filter for fakes&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Native client attestation&lt;/td&gt;
&lt;td&gt;Cryptographic hash mechanism publicly documented&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security header feature flags&lt;/td&gt;
&lt;td&gt;Remote disabling of security headers revealed&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unreleased product roadmap&lt;/td&gt;
&lt;td&gt;KAIROS, UltraPlan, Coordinator Mode visible to competitors&lt;/td&gt;
&lt;td&gt;Medium-High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Internal model codenames&lt;/td&gt;
&lt;td&gt;"Capybara," "Tengu" disclosed&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operational bugs&lt;/td&gt;
&lt;td&gt;250K wasted API calls/day, trivially fixable&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The anti-distillation system is the clearest loss. Its entire value depended on competitors not knowing it existed.&lt;/p&gt;

&lt;p&gt;This connects to something I've written about before. When I &lt;a href="https://dev.to/blog/found-database-password-in-claude-code-session"&gt;found my database password sitting in a Claude Code session file&lt;/a&gt;, the issue wasn't that Claude Code was doing something malicious. The issue was that it operates with deep filesystem access and stores everything in unencrypted JSONL files that nobody checks. The source leak confirms what I suspected: there's limited internal safeguarding around what gets stored and transmitted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claw Code: 146K stars in 48 hours
&lt;/h2&gt;

&lt;p&gt;Within hours of the leak, a developer ported Claude Code's core architecture to Python and Rust from scratch. &lt;a href="https://github.com/ultraworkers/claw-code" rel="noopener noreferrer"&gt;Claw Code&lt;/a&gt; hit 146,000 GitHub stars and 101,000 forks in under 48 hours.&lt;/p&gt;

&lt;p&gt;It's a clean-room rewrite, not a fork of the leaked code. The repo disclaims any affiliation with Anthropic and says the exposed snapshot "is no longer part of the tracked repository state." The developer was later featured in a Wall Street Journal article as a power user who consumed "25 billion tokens" of AI coding tools per year.&lt;/p&gt;

&lt;p&gt;The project includes an interactive CLI, plugin system, MCP orchestration, streaming API support, and LSP integration. Rust (92.9%), Python (7.1%).&lt;/p&gt;

&lt;p&gt;We've seen this before. When Meta's LLaMA model weights leaked in 2023, they chased takedowns for a while, then gave up and went open. The community built derivatives no matter what legal said. 146K stars on Claw Code tells you what developers actually want. Whether Anthropic decides to offer an open alternative is almost beside the point now.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger picture
&lt;/h2&gt;

&lt;p&gt;This didn't happen in isolation. It capped a rough month for Anthropic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feb 16: Pentagon threatened Anthropic with punitive action&lt;/li&gt;
&lt;li&gt;Mar 5: Pentagon formally labeled Anthropic a "supply chain risk" (&lt;a href="https://www.wsj.com/politics/national-security/pentagon-formally-labels-anthropic-supply-chain-risk-escalating-conflict-ebdf0523" rel="noopener noreferrer"&gt;WSJ&lt;/a&gt;, 2026)&lt;/li&gt;
&lt;li&gt;Mar 9: Anthropic sued the Pentagon (&lt;a href="https://www.axios.com/2026/03/09/anthropic-sues-pentagon-supply-chain-risk-label" rel="noopener noreferrer"&gt;Axios&lt;/a&gt;, 2026)&lt;/li&gt;
&lt;li&gt;Mar 26: Federal judge blocked the Pentagon's effort (&lt;a href="https://www.cnn.com/2026/03/26/business/anthropic-pentagon-injunction-supply-chain-risk" rel="noopener noreferrer"&gt;CNN&lt;/a&gt;, 2026)&lt;/li&gt;
&lt;li&gt;Mar 31: Source code leaked via npm. DMCA takedowns hit 8,100+ repos&lt;/li&gt;
&lt;li&gt;Apr 1: TechCrunch runs &lt;a href="https://techcrunch.com/2026/03/31/anthropic-is-having-a-month/" rel="noopener noreferrer"&gt;"Anthropic is having a month"&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anthropic built its brand on responsible development and safety-first engineering. Then a source map shipped in an npm package and nobody caught it. The DMCA response hit thousands of uninvolved developers. And &lt;code&gt;undercover.ts&lt;/code&gt; was hiding AI authorship while the company publicly advocated for transparency.&lt;/p&gt;

&lt;p&gt;I still use Claude Code. I don't think it's a bad product. But the gap between the safety messaging and the operational reality is now documented in 1,700 TypeScript files. Anyone can read them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do now
&lt;/h2&gt;

&lt;p&gt;If you use Claude Code, there's nothing you need to patch or update. The leak was Anthropic's source code, not your data.&lt;/p&gt;

&lt;p&gt;What's worth paying attention to is how Anthropic responds. As of this writing, there's been no official statement on their newsroom, blog, or developer channels. The only Anthropic employee who commented publicly was Jared Sumner, and only to clarify the Bun bug wasn't the cause. Whether they address undercover mode, the DMCA overreach, or the anti-distillation system will say a lot about how they handle things going forward.&lt;/p&gt;

&lt;p&gt;And if you're eyeing Claw Code as an alternative, know what you're getting into. It's a clean-room rewrite with different internals, not a fork.&lt;/p&gt;

&lt;p&gt;Or maybe this is the push to try something else entirely. &lt;a href="https://forgecode.dev/" rel="noopener noreferrer"&gt;ForgeCode&lt;/a&gt; currently tops TermBench 2.0 and has been getting a lot of attention. I haven't switched yet, but I'd be lying if I said I wasn't curious.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What exactly was leaked in the Claude Code source code?
&lt;/h3&gt;

&lt;p&gt;The full TypeScript source, exposed via a JavaScript source map in npm package v2.1.88. It included 35+ tools, 73+ slash commands, 200+ feature gates, and unreleased features like KAIROS autonomous mode and undercover mode (&lt;a href="https://ccunpacked.dev/" rel="noopener noreferrer"&gt;ccunpacked.dev&lt;/a&gt;, 2026).&lt;/p&gt;

&lt;h3&gt;
  
  
  Why did Anthropic take down 8,100 GitHub repositories?
&lt;/h3&gt;

&lt;p&gt;They filed DMCA takedown notices targeting the entire fork network of the repo hosting the leaked code. Many repos contained no leaked material. Anthropic later called the mass takedown accidental and retracted most notices (&lt;a href="https://techcrunch.com/2026/04/01/anthropic-took-down-thousands-of-github-repos-trying-to-yank-its-leaked-source-code-a-move-the-company-says-was-an-accident/" rel="noopener noreferrer"&gt;TechCrunch&lt;/a&gt;, 2026).&lt;/p&gt;

&lt;h3&gt;
  
  
  Is my data at risk from the Claude Code leak?
&lt;/h3&gt;

&lt;p&gt;No. This was source code, not user data. That said, the source did reveal how session data is handled and that feature flags exist to disable security headers remotely.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Claw Code?
&lt;/h3&gt;

&lt;p&gt;Someone ported Claude Code's core architecture to Python and Rust from scratch within hours of the leak. It's a clean-room rewrite, not a fork. 146,000 stars and 101,000 forks in under 48 hours. Not affiliated with Anthropic (&lt;a href="https://github.com/ultraworkers/claw-code" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;).&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/" rel="noopener noreferrer"&gt;Claude Code Source Leak Analysis&lt;/a&gt; - alex000kim, March 31, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ccunpacked.dev/" rel="noopener noreferrer"&gt;Claude Code Unpacked Visual Guide&lt;/a&gt; - ccunpacked.dev, April 1, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/github/dmca/blob/master/2026/03/2026-03-31-anthropic.md" rel="noopener noreferrer"&gt;Anthropic DMCA Notice&lt;/a&gt; - GitHub DMCA Archive, March 31, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://news.ycombinator.com/item?id=47584540" rel="noopener noreferrer"&gt;HN Thread: Source Leak Disclosure&lt;/a&gt; - Hacker News, March 31, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://techcrunch.com/2026/04/01/anthropic-took-down-thousands-of-github-repos-trying-to-yank-its-leaked-source-code-a-move-the-company-says-was-an-accident/" rel="noopener noreferrer"&gt;Anthropic took down thousands of GitHub repos&lt;/a&gt; - TechCrunch, April 1, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://techcrunch.com/2026/03/31/anthropic-is-having-a-month/" rel="noopener noreferrer"&gt;Anthropic is having a month&lt;/a&gt; - TechCrunch, March 31, 2026&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ultraworkers/claw-code" rel="noopener noreferrer"&gt;Claw Code Repository&lt;/a&gt; - GitHub&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>security</category>
    </item>
    <item>
      <title>I found my database password in a Claude Code session file</title>
      <dc:creator>Liran Baba</dc:creator>
      <pubDate>Tue, 31 Mar 2026 13:00:00 +0000</pubDate>
      <link>https://forem.com/liran_baba/i-found-my-database-password-in-a-claude-code-session-file-2fe8</link>
      <guid>https://forem.com/liran_baba/i-found-my-database-password-in-a-claude-code-session-file-2fe8</guid>
      <description>&lt;p&gt;I use Claude Code for most of my programming work, and I have very little idea what it's actually doing under the hood.&lt;/p&gt;

&lt;p&gt;A few months ago I was poking around &lt;code&gt;~/.claude/projects/&lt;/code&gt; and opened a session JSONL file. Buried in the conversation, Claude Code had read a &lt;code&gt;.env&lt;/code&gt; file and echoed its contents back as a tool result. My database password, sitting in plaintext, in a file I never look at.&lt;/p&gt;

&lt;p&gt;That was the afternoon I stopped what I was working on and started building Claudoscope.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem isn't Claude Code. It's visibility.
&lt;/h2&gt;

&lt;p&gt;Claude Code doesn't have a cost breakdown per session. The Enterprise API doesn't surface spend data at all; only the admin dashboard does, and it's not granular enough. When we rolled it out across the org, nobody could answer basic questions: which sessions are expensive? Is the agent stuck in a loop somewhere? Is our CLAUDE.md actually doing anything useful or just eating context window?&lt;/p&gt;

&lt;p&gt;And the security angle was worse. Session files contain the full conversation, including anything the agent reads from disk. If it touches a file with credentials, those credentials now live in an unencrypted JSONL file indefinitely. Nobody was checking for that.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdhspjp6q2j12sjtbycae.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdhspjp6q2j12sjtbycae.png" alt="Claudoscope menu bar widget" width="542" height="1064"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So I built a flashlight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claudoscope&lt;/strong&gt; is a native macOS menu bar app. It watches your Claude Code session files locally, parses them, and gives you a dashboard. Nothing leaves your machine.&lt;/p&gt;

&lt;p&gt;The menu bar widget gives you a glance: today's sessions, tokens, cost, and any sessions that are currently running with a live cost number next to them. Click through to the full dashboard when you want the details.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Why did Tuesday cost $47?"
&lt;/h3&gt;

&lt;p&gt;That was the question I kept asking and couldn't answer. The analytics view breaks it down: cost by project, cost by model, daily trends. The cache tab shows whether your prompt cache is stable or busting on every request (cache busting is expensive and invisible without tracking). There's a what-if calculator that shows what your bill would look like if you moved Opus sessions to Sonnet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fteqwnxt8mhsh3dc1oyut.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fteqwnxt8mhsh3dc1oyut.png" alt="Claudoscope analytics dashboard" width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  "Is my CLAUDE.md any good?"
&lt;/h3&gt;

&lt;p&gt;I didn't plan on building a config linter. It started as a quick check for obvious problems in my own setup. Then I ran it on a colleague's CLAUDE.md and found it was over 4,000 tokens, roughly 10% of the context window eaten by instructions before the agent even started working. So I made it a rule.&lt;/p&gt;

&lt;p&gt;The linter now has 19 rules. It checks CLAUDE.md structure, skill metadata, deprecated commands, token budget estimates. It groups findings by rule rather than by file, so you see patterns. One rule (subprocess env scrub) has a one-click auto-fix.&lt;/p&gt;

&lt;p&gt;The first time I ran it on our team's configs, it flagged raw XML brackets in a skill's frontmatter that would break the system prompt parser. Nobody had noticed because the failure was silent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hxx616iula51fb2lpbm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hxx616iula51fb2lpbm.png" alt="Claudoscope health linter" width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Secret scanning
&lt;/h3&gt;

&lt;p&gt;This is probably the most useful feature and also the hardest one to get people excited about. Did the agent just leak your credentials? You'd never know unless something was watching.&lt;/p&gt;

&lt;p&gt;Claudoscope scans session files for leaked credentials: private keys, AWS access keys, auth headers, API tokens, passwords in connection strings. It uses regex matching, Shannon entropy analysis, and allowlists for placeholder values. The entropy check matters because without it you get a wall of false positives from example code and docs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2For6c3105nkk61j2h1q72.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2For6c3105nkk61j2h1q72.png" alt="Claudoscope realtime secret scanning" width="720" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When it finds something, a panel pops up on screen. Doesn't matter if the dashboard is open. It watches the tail of active session files and alerts you immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned from my own data
&lt;/h2&gt;

&lt;p&gt;Building this meant spending a lot of time inside Claude Code's JSONL format. A few things I didn't expect:&lt;/p&gt;

&lt;p&gt;Prompt cache reads are cheap ($0.30/MTok on Sonnet vs $3.00 uncached), so I assumed most of my input was cached. On some projects, 30-40% wasn't. The cache busts when session context shifts after compaction, and before I had a hit rate chart staring me in the face, I had no idea.&lt;/p&gt;

&lt;p&gt;I also figured my expensive sessions would be the big multi-hour ones. They weren't. The cost was in dozens of short sessions where Claude Code loaded context, did one thing, and exited. Each one paid full input with no cache. Fifty quick questions cost me more than the three-hour refactor.&lt;/p&gt;

&lt;p&gt;Most CLAUDE.md files across our team were 2,000-5,000 tokens. Context window you pay for on every message. A few people trimmed theirs after seeing the linter's token estimate.&lt;/p&gt;

&lt;p&gt;And one gotcha for anyone parsing these files themselves: the JSONL contains intermediate records with null &lt;code&gt;stop_reason&lt;/code&gt;, in-progress streaming responses. Sum all records naively and you double-count tokens. I shipped this bug and didn't catch it until cost estimates were 1.5-2x the actual Vertex bill. Not documented anywhere, as far as I can tell.&lt;/p&gt;

&lt;h2&gt;
  
  
  Under the hood
&lt;/h2&gt;

&lt;p&gt;It watches &lt;code&gt;~/.claude/projects/&lt;/code&gt; with macOS FSEvents (not polling). Session parsing runs on a Swift actor for thread safety. Cost estimation runs per-message, not per-session, because different messages in the same session can use different models. There's an LRU cache (20 sessions) so navigating between recent sessions feels instant.&lt;/p&gt;

&lt;p&gt;I built it in SwiftUI, macOS 14+, Apple Silicon only. I wanted it to feel like a Mac app. That means no Linux or Windows, and I'm fine with that tradeoff.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;Free, open source, macOS only (Apple Silicon). Homebrew:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew tap cordwainersmith/claudoscope
brew &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cask&lt;/span&gt; claudoscope
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or grab the DMG from &lt;a href="https://github.com/cordwainersmith/Claudoscope" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. It auto-updates. The cost estimation is most useful on Enterprise plans where per-session data isn't available, but session analytics and config linting work regardless of your plan.&lt;/p&gt;

&lt;p&gt;Go check your session files. You might not like what you find.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>claudecode</category>
      <category>security</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
