<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Max Quimby</title>
    <description>The latest articles on Forem by Max Quimby (@max_quimby).</description>
    <link>https://forem.com/max_quimby</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3823178%2F0a97facc-1e95-494c-9db9-084aa3b35e47.png</url>
      <title>Forem: Max Quimby</title>
      <link>https://forem.com/max_quimby</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/max_quimby"/>
    <language>en</language>
    <item>
      <title>Google's $40B Anthropic Bet: What It Means for Developers</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Sun, 26 Apr 2026 03:39:13 +0000</pubDate>
      <link>https://forem.com/max_quimby/googles-40b-anthropic-bet-what-it-means-for-developers-5g5o</link>
      <guid>https://forem.com/max_quimby/googles-40b-anthropic-bet-what-it-means-for-developers-5g5o</guid>
      <description>&lt;p&gt;Last Thursday, &lt;a href="https://www.bloomberg.com/news/articles/2026-04-24/google-plans-to-invest-up-to-40-billion-in-anthropic" rel="noopener noreferrer"&gt;Google announced&lt;/a&gt; it would invest up to $40 billion in Anthropic — the company behind Claude. The headline is enormous, but the structure of the deal is what developers should actually study. This isn't a standard venture investment. It's a circular finance loop: Google gives Anthropic capital, Anthropic spends that capital on Google Cloud compute, Google books the revenue. The money goes around in a circle, and what comes out the other end is 5 gigawatts of dedicated AI compute locked to the Google TPU stack.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/google-40b-anthropic-investment-circular-deal-developers" rel="noopener noreferrer"&gt;Read the full version with charts and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For developers building on the Claude API, this matters more than it looks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47892074" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdhpbvgum5e9159opj30.png" alt="Hacker News: Google plans to invest up to $40B in Anthropic — 798 points, 798 comments" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Deal Structure, Decoded
&lt;/h2&gt;

&lt;p&gt;The $40 billion breaks into two tranches. &lt;a href="https://techcrunch.com/2026/04/24/google-to-invest-up-to-40b-in-anthropic-in-cash-and-compute/" rel="noopener noreferrer"&gt;TechCrunch reported&lt;/a&gt; that $10 billion is immediate cash at a $350 billion valuation for Anthropic. The remaining $30 billion is contingent — tied to undisclosed performance milestones that function as options Google can exercise over time.&lt;/p&gt;

&lt;p&gt;That 75/25 structure matters. The immediate $10B is real capital. The $30B contingent tranche is more accurately described as a multi-year compute credit facility dressed as an investment. &lt;a href="https://www.ghacks.net/2026/04/25/google-plans-to-invest-up-to-40-billion-in-anthropic-in-two-phase-deal-tied-to-performance-targets/" rel="noopener noreferrer"&gt;gHacks&lt;/a&gt; describes it as "a hybrid of Microsoft's OpenAI playbook and the cloud-credit model Amazon used in 2023 — equity capital flows out, but the bulk cycles back into Google Cloud as TPU spend over a multi-year horizon."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.ghacks.net/2026/04/25/google-plans-to-invest-up-to-40-billion-in-anthropic-in-two-phase-deal-tied-to-performance-targets/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7et1oep3iadkd7dxvwm.png" alt="gHacks: Google Plans to Invest Up to $40 Billion in Anthropic in Two-Phase Deal" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This comes just days after Amazon announced its own &lt;a href="https://computeleap.com/blog/anthropic-100b-aws-claude-dominance-6-month-clock-2026" rel="noopener noreferrer"&gt;$33 billion Anthropic deal&lt;/a&gt; — with a separate $100 billion compute commitment to AWS infrastructure. In under 100 hours, Anthropic collected $65+ billion in fresh pledges from its two largest cloud partners.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ &lt;strong&gt;The circular deal, simplified:&lt;/strong&gt; Google gives Anthropic $40B → Anthropic buys Google Cloud TPUs → Google books cloud revenue. The investment is also a guaranteed customer acquisition for Google's infrastructure business.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why "Circular" — And Why It Matters
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://www.humai.blog/google-just-gave-anthropic-40-billion-anthropic-will-spend-it-on-google/" rel="noopener noreferrer"&gt;Humai blog&lt;/a&gt; published the clearest diagnosis of the deal structure: "The $40 billion is, in practical terms, a very expensive customer acquisition cost — paid in advance, recorded as an investment, and recouped through cloud bills nobody outside the deal will ever audit."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.humai.blog/google-just-gave-anthropic-40-billion-anthropic-will-spend-it-on-google/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmp74p5s77zhshbfkm8f2.png" alt="Humai Blog analysis: Google's $40B Anthropic Deal is Circular Finance, Not Investment" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That framing went viral on Hacker News, where the &lt;a href="https://news.ycombinator.com/item?id=47892074" rel="noopener noreferrer"&gt;story topped 798 points and 798 comments&lt;/a&gt; — the platform's top story on April 24. The community immediately noted that Anthropic is now what one commenter called "a MicroAmaGooVidia amalgamation" — simultaneously backed by Microsoft, Amazon, Google, and dependent on all three for compute.&lt;/p&gt;

&lt;p&gt;The circular structure isn't new — Amazon's 2023 investment used the same cloud-credit playbook. But the scale is novel. The cumulative concentration of hyperscaler-AI lab partnerships (Microsoft–OpenAI, Google–Anthropic, Amazon–Anthropic) has grown large enough that analysts note the FTC, DOJ, and European Commission are likely to revisit the structure.&lt;/p&gt;

&lt;p&gt;For developers, the circular nature matters for one specific reason: it means Anthropic's compute access is now &lt;em&gt;structurally guaranteed&lt;/em&gt; by capital agreements, not just purchasing relationships. That's a different kind of stability.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Anthropic Gets: 5 Gigawatts and a Roadmap
&lt;/h2&gt;

&lt;p&gt;The concrete deliverable from this deal isn't the $40 billion number — it's the 5 gigawatts of dedicated compute capacity that Google Cloud will provide over five years. &lt;a href="https://www.anthropic.com/news/google-broadcom-partnership-compute" rel="noopener noreferrer"&gt;Anthropic's own announcement&lt;/a&gt; notes this builds on a separate Broadcom partnership for 3.5 gigawatts of next-generation TPU capacity coming online in 2027.&lt;/p&gt;

&lt;p&gt;Combine the two commitments and you get a picture of Anthropic's training substrate for the next 3–5 years: a massive TPU-first infrastructure that validates Google's chips as a credible alternative to Nvidia for frontier model training. The financial backdrop makes the compute question urgent. &lt;a href="https://sacra.com/c/anthropic/" rel="noopener noreferrer"&gt;Sacra's research&lt;/a&gt; shows Anthropic's revenue grew from $1 billion annualized in December 2024 to $30 billion in April 2026 — a 30x increase in 16 months. Business customers spending over $1 million annually doubled from 500 to 1,000 in under two months. Claude Code alone reached $2.5 billion in annualized billings. Demand is outrunning supply.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mythos: The Model This Compute Is Built For
&lt;/h2&gt;

&lt;p&gt;There's a specific model behind the compute math. &lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/claude-mythos-preview-on-vertex-ai" rel="noopener noreferrer"&gt;Google Cloud's blog announced&lt;/a&gt; Claude Mythos in private preview on Vertex AI as part of "Project Glasswing" in early April. &lt;a href="https://sherwood.news/tech/report-despite-blacklisting-nsa-currently-using-anthropics-mythos-model/" rel="noopener noreferrer"&gt;Sherwood News reported&lt;/a&gt; that Mythos — internally codenamed "Capybara" — is described in Anthropic's red-team disclosures as "a step change" above Opus 4.6, with pricing in the gated preview at $25 per million input tokens and $125 per million output tokens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://sherwood.news/tech/report-despite-blacklisting-nsa-currently-using-anthropics-mythos-model/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk09q1tpw9defonrnxzzc.png" alt="Sherwood News: NSA is currently using Anthropic's unreleased Mythos model despite blacklisting" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We already have a &lt;a href="https://computeleap.com/blog/claude-mythos-preview-project-glasswing-cybersecurity" rel="noopener noreferrer"&gt;deep look at Claude Mythos and Project Glasswing&lt;/a&gt; on ComputeLeap. The key new data point: Mythos is already being used in production by the NSA despite official blacklisting — a signal that the model's capability premium is significant enough to override institutional friction.&lt;/p&gt;

&lt;p&gt;Prediction markets are watching closely. &lt;a href="https://polymarket.com/event/which-company-has-the-best-ai-model-end-of-april" rel="noopener noreferrer"&gt;Polymarket's&lt;/a&gt; "Which company has the best AI model end of April?" market has $18.5M in volume, with Anthropic currently at ~90% implied probability — even as DeepSeek V4, GPT-5.5, and Meta Muse Spark all launched in the same 72-hour window this week.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://polymarket.com/event/which-company-has-the-best-ai-model-end-of-april" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkhs3ocnkxa4m0p2bx8e.png" alt="Polymarket: Anthropic at 90% implied probability for best AI model end of April — $18.5M in volume" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;For developers:&lt;/strong&gt; Claude Mythos is accessible now via Vertex AI for approved enterprise accounts. If you're building on Google Cloud, apply for Project Glasswing access — this is the earliest path to the next frontier tier before public availability.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What It Means for Developers: Capacity, Rate Limits, and Platform Choice
&lt;/h2&gt;

&lt;p&gt;The practical developer question is straightforward: will this make Claude faster to call, higher-limit, and more reliable?&lt;/p&gt;

&lt;p&gt;The short answer is yes — but not immediately.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tokencalculator.com/blog/claude-api-rate-limits-april-2026" rel="noopener noreferrer"&gt;Current Claude API rate limits&lt;/a&gt; reflect a compute-constrained environment. Tier 1 developers get 50 requests per minute and 30,000 tokens per minute. Tier 4 (requiring $400 in cumulative credits) reaches 4,000 RPM and 2,000,000 ITPM. The &lt;a href="https://computeleap.com/blog/claude-code-quota-limits-billing-changes-2026" rel="noopener noreferrer"&gt;Claude Code rate limits guide&lt;/a&gt; covers the developer-side mechanics in detail.&lt;/p&gt;

&lt;p&gt;New infrastructure takes 12–24 months to translate into available capacity. The 5GW Google committed and the 3.5GW Broadcom deal (starting 2027) won't relieve rate pressure until late 2026 at earliest. But the trajectory is clear: Anthropic is building a compute foundation sized for the next order of magnitude of demand.&lt;/p&gt;

&lt;p&gt;There's also a platform availability angle that's underappreciated. Anthropic is now &lt;a href="https://www.anthropic.com/news/google-broadcom-partnership-compute" rel="noopener noreferrer"&gt;the only frontier AI lab with native integrations across all three major cloud platforms&lt;/a&gt;: AWS Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry. If you're an enterprise developer already committed to any of the big three clouds, Claude is there — and the investment locks in that availability for years.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ &lt;strong&gt;Multi-cloud positioning:&lt;/strong&gt; Anthropic's presence on AWS Bedrock, Google Vertex AI, and Azure Foundry means enterprise developers don't have to migrate infrastructure to access Claude. This is a meaningful competitive moat that OpenAI (primarily Microsoft/Azure-aligned) doesn't match.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Regulatory Overhang
&lt;/h2&gt;

&lt;p&gt;One signal developers building on Claude should track: regulatory scrutiny of these hyperscaler-AI lab partnerships is coming. The FTC, DOJ, and EU Commission are likely to revisit the structure of Microsoft–OpenAI, Google–Anthropic, and Amazon–Anthropic simultaneously.&lt;/p&gt;

&lt;p&gt;The risk for developers isn't that Claude goes away. It's that regulatory action could constrain how these deals are structured going forward — potentially affecting compute availability SLAs, pricing tiers, or multi-cloud access. Worth watching, but not worth panicking over for most development teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Position Your Claude App for the Capacity Wave
&lt;/h2&gt;

&lt;p&gt;If you're building production applications on Claude, the Google deal changes your planning horizon:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Short-term (now → Q3 2026):&lt;/strong&gt; Capacity is still constrained. Use prompt caching aggressively — cached tokens don't count against your TPM limit, effectively multiplying your throughput at no extra cost. Route lower-stakes tasks to Claude Haiku 4.5, which has more generous limits. Use the Batch API for non-real-time workloads at 50% cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Medium-term (Q4 2026 → 2027):&lt;/strong&gt; New Google Cloud capacity starts coming online. Rate limit tiers should expand meaningfully. If you're currently hitting walls at Tier 2 or Tier 3, plan for those ceilings to rise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long-term (2027+):&lt;/strong&gt; The 3.5GW Broadcom TPU deal comes online. This is Mythos-scale compute — the infrastructure that trains and runs models well above current pricing tiers.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://computeleap.com/blog/anthropic-vs-openai-api-developer-platform-2026" rel="noopener noreferrer"&gt;Anthropic vs. OpenAI platform comparison&lt;/a&gt; covers how these compute roadmaps translate into API feature differences — worth revisiting with this investment context in mind.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;Google's $40B investment in Anthropic is simultaneously a capital event, a compute lockup, and a signal about where frontier AI infrastructure is headed. The circular structure isn't a flaw — it's the point. Hyperscalers are discovering that the most effective way to guarantee demand for their own compute infrastructure is to fund the companies that need the most compute.&lt;/p&gt;

&lt;p&gt;For developers, the practical read is this: Anthropic is better capitalized and better infrastructure-secured than it has ever been. The models getting trained on 5 gigawatts of Google TPUs over the next five years will be substantially more capable than what's available today. The question isn't whether Claude will have compute — it's whether you're building on a platform positioned to scale with it.&lt;/p&gt;

&lt;p&gt;The capacity wave is coming. The capital to fund it just got committed.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/google-40b-anthropic-investment-circular-deal-developers" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>anthropic</category>
      <category>ai</category>
      <category>developers</category>
      <category>cloudcomputing</category>
    </item>
    <item>
      <title>GStack: Turn Claude Code Into a Full Engineering Team</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Sat, 25 Apr 2026 04:12:55 +0000</pubDate>
      <link>https://forem.com/max_quimby/gstack-turn-claude-code-into-a-full-engineering-team-1c7e</link>
      <guid>https://forem.com/max_quimby/gstack-turn-claude-code-into-a-full-engineering-team-1c7e</guid>
      <description>&lt;p&gt;The first time you type &lt;code&gt;/office-hours&lt;/code&gt; into Claude Code with GStack installed, something strange happens. The AI stops acting like a helpful coding assistant and starts acting like a skeptical product manager who thinks your feature idea is probably wrong.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://agentconn.com/blog/gstack-claude-code-harness-open-source-2026" rel="noopener noreferrer"&gt;Read the full version on AgentConn →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the design. And it is why &lt;a href="https://github.com/garrytan/gstack" rel="noopener noreferrer"&gt;GStack&lt;/a&gt; — Garry Tan's open-source Claude Code skill setup — has accumulated 82,700 stars and 12,000 forks on GitHub since its March 2026 launch.&lt;/p&gt;

&lt;p&gt;For context: Garry Tan is the President and CEO of Y Combinator. When the person who has reviewed more startups than almost anyone else on earth open-sources the exact AI development workflow that runs his code, developers pay attention. They also argue about it extensively on &lt;a href="https://news.ycombinator.com/item?id=47418576" rel="noopener noreferrer"&gt;Hacker News&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/garrytan/status/2032014570118922347" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh81gsfxjhp4buwswrh50.png" alt="Garry Tan tweet: I've been having such an amazing time with Claude Code I wanted you to be able to have my exact skill setup — Introducing gstack" width="548" height="847"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This guide explains what GStack actually does, how it compares to &lt;a href="https://github.com/code-yeongyu/oh-my-openagent" rel="noopener noreferrer"&gt;oh-my-openagent&lt;/a&gt; and other harnesses, why the "it's just prompts" criticism misses the point, and whether it belongs in your workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  What GStack Actually Does: The 23 Skills
&lt;/h2&gt;

&lt;p&gt;GStack is not a new coding assistant. It is a collection of CLAUDE.md skills — structured instructions that give Claude Code specialist personas. Install it in your project, and Claude Code gains access to 23 tools that simulate an engineering team.&lt;/p&gt;

&lt;p&gt;The roles divide into recognizable job functions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Planning and Strategy&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/office-hours&lt;/code&gt; — Product interrogation with forcing questions. Challenges your idea before you build it. The "skeptical PM" experience.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/plan-ceo-review&lt;/code&gt; — Strategic scope challenge. Asks whether you are solving the right problem.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/plan-eng-review&lt;/code&gt; — Architecture and testing challenge. Finds the assumptions in your technical plan.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/plan-design-review&lt;/code&gt; — Design system audit. Catches "AI slop" — visual patterns that look fine locally but break at scale.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/plan-devex-review&lt;/code&gt; — Developer experience review of the plan.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/autoplan&lt;/code&gt; — Runs CEO, Engineering, and DevEx review in sequence automatically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Design and Implementation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/design-consultation&lt;/code&gt;, &lt;code&gt;/design-shotgun&lt;/code&gt;, &lt;code&gt;/design-html&lt;/code&gt; — Design guidance at various fidelity levels.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/review&lt;/code&gt; — Code review targeting security issues, bugs, and architectural concerns.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/investigate&lt;/code&gt; — Root-cause debugging with structured reasoning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Testing and Quality&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/qa&lt;/code&gt; — Live browser testing with fixes applied inline.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/qa-only&lt;/code&gt; — Bug reporting without code modification.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/cso&lt;/code&gt; — Security audit applying OWASP Top 10 and STRIDE threat modeling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Release and Deployment&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/ship&lt;/code&gt;, &lt;code&gt;/land-and-deploy&lt;/code&gt;, &lt;code&gt;/document-release&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Additional Tools&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/browse&lt;/code&gt;, &lt;code&gt;/canary&lt;/code&gt;, &lt;code&gt;/benchmark&lt;/code&gt;, &lt;code&gt;/retro&lt;/code&gt;, &lt;code&gt;/codex&lt;/code&gt;, &lt;code&gt;/pair-agent&lt;/code&gt;, &lt;code&gt;/learn&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;/codex&lt;/code&gt; skill adds OpenAI Codex as a parallel review engine inside Claude Code, giving you cross-model code review without leaving your terminal.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Conductor: Parallelizing Everything
&lt;/h2&gt;

&lt;p&gt;The Conductor coordinates multiple Claude Code sessions running simultaneously in isolated workspaces. One session running &lt;code&gt;/office-hours&lt;/code&gt; on a new idea, another doing &lt;code&gt;/review&lt;/code&gt; on an open PR, a third implementing a feature, a fourth running &lt;code&gt;/qa&lt;/code&gt; on staging — each in its own git worktree with its own context window.&lt;/p&gt;

&lt;p&gt;This is the part that makes GStack genuinely novel compared to a folder of CLAUDE.md prompts. Conductor is multi-agent orchestration built into the harness — not a separate tool you have to wire up yourself.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Productivity Claim: 810×
&lt;/h2&gt;

&lt;p&gt;Garry Tan reports his 2026 development pace at approximately 810× his 2013 baseline (11,417 logical lines/day vs 14). Key caveats:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The metric is "logical LOC," not raw lines.&lt;/strong&gt; Logical LOC measures meaningful changes — new behaviors, not reformatted whitespace. This is a more honest metric than it first appears.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 2013 baseline is a single-developer comparison.&lt;/strong&gt; Tan is comparing his own pre-AI vs. post-AI productivity. Not a controlled experiment, but an honest data point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It doesn't hold for all workflows.&lt;/strong&gt; The &lt;a href="https://techcrunch.com/2026/03/17/why-garry-tans-claude-code-setup-has-gotten-so-much-love-and-hate/" rel="noopener noreferrer"&gt;TechCrunch analysis&lt;/a&gt; notes developers working on hardware-adjacent code or regulated domains see much smaller gains.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47418576" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F99uuizqy44wfwrkpbwqe.png" alt="HN thread: Garry Tan's Claude Code Setup — 74 points, 79 comments" width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The "Just Prompts" Criticism
&lt;/h2&gt;

&lt;p&gt;The most common dismissal: GStack is "a bunch of prompts in a text file." This criticism is partially correct and mostly misses the point.&lt;/p&gt;

&lt;p&gt;It is correct that the individual skills are structured prompts. There's no compiled code, nothing that prevents you from reading every CLAUDE.md instruction.&lt;/p&gt;

&lt;p&gt;What the criticism misses is that &lt;strong&gt;the value is in the system design, not the technology&lt;/strong&gt;. The insight is architectural: separating planning from implementation, using adversarial reviewing roles, and enforcing security audits as a default step before shipping. These are software engineering principles applied to AI agent orchestration.&lt;/p&gt;

&lt;p&gt;The CTO testimonial Garry Tan shared is worth taking at face value:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/garrytan/status/2032196172430131498" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhh7z6x6vqsu9yeze5mkh.png" alt="Garry Tan quoting a CTO: Your eng review discovered a subtle XSS attack that I don't even think my team is aware of" width="548" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A security audit that runs automatically before every merge is not "just a prompt." It is a default gate that most teams skip under schedule pressure. GStack makes skipping it harder than doing it.&lt;/p&gt;




&lt;h2&gt;
  
  
  GStack vs the Field
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;GStack&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;oh-my-openagent&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;GSD&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;cc-switch&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stars&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;82.7K&lt;/td&gt;
&lt;td&gt;53.9K&lt;/td&gt;
&lt;td&gt;35K&lt;/td&gt;
&lt;td&gt;54K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model lock-in&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude Code only&lt;/td&gt;
&lt;td&gt;Multi-model&lt;/td&gt;
&lt;td&gt;Claude Code first&lt;/td&gt;
&lt;td&gt;Model-agnostic config&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Specialist roles&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;23 skills&lt;/td&gt;
&lt;td&gt;11 agents&lt;/td&gt;
&lt;td&gt;Spec-driven only&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parallel sessions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (Conductor)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Install complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;30 seconds (paste)&lt;/td&gt;
&lt;td&gt;npm install&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;CLI install&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://github.com/code-yeongyu/oh-my-openagent" rel="noopener noreferrer"&gt;oh-my-openagent&lt;/a&gt; routes tasks to the best model — if you need DeepSeek for cost-sensitive tasks and Claude for hard reasoning, OmO handles the routing. GStack is entirely Claude Code native.&lt;/p&gt;




&lt;h2&gt;
  
  
  When GStack Wins, When It Doesn't
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GStack is best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Solo developers building SaaS or web products without a senior team&lt;/li&gt;
&lt;li&gt;Early-stage startups without dedicated QA, security reviewer, or architect&lt;/li&gt;
&lt;li&gt;Developers already on Claude Code — zero-friction install&lt;/li&gt;
&lt;li&gt;Teams shipping fast who default to skipping review steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GStack is probably wrong for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Teams needing multi-model routing (OmO is better)&lt;/li&gt;
&lt;li&gt;Teams with mature code review culture (GStack replaces informal processes)&lt;/li&gt;
&lt;li&gt;Developers on OpenCode or non-Claude agents (CLAUDE.md-native)&lt;/li&gt;
&lt;li&gt;Embedded, firmware, or highly regulated domains&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Getting Started in 30 Seconds
&lt;/h2&gt;

&lt;p&gt;GStack lives at &lt;a href="https://github.com/garrytan/gstack" rel="noopener noreferrer"&gt;github.com/garrytan/gstack&lt;/a&gt;. Install: open Claude Code and type &lt;code&gt;Install GStack&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Your first three commands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/office-hours&lt;/code&gt; — Challenge your current feature idea&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/cso&lt;/code&gt; — Security audit on your last commit&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/autoplan&lt;/code&gt; — CEO, Eng, and DevEx review your next technical plan&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://x.com/garrytan/status/2037355994838429849" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4t156z0wj49v89mxjpb4.png" alt="Garry Tan: 50k stars and it feels so good — type install gstack into claude code right now" width="548" height="566"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;GStack implements software engineering best practices — adversarial review, security auditing, design critique, spec challenge — as default steps in your Claude Code workflow. Steps that solo developers skip not because they are bad engineers but because there is nobody else in the room.&lt;/p&gt;

&lt;p&gt;If you are a Claude Code user building a product, install it. The 30-second install cost is trivially small relative to finding a single XSS vulnerability before it ships to production.&lt;/p&gt;

&lt;p&gt;The frontier in AI-assisted development is not a better autocomplete. It is a well-designed team of reviewers who catch the mistakes you were going to make anyway.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://agentconn.com/blog/gstack-claude-code-harness-open-source-2026" rel="noopener noreferrer"&gt;AgentConn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>devtools</category>
      <category>opensource</category>
    </item>
    <item>
      <title>DeepSeek V4 vs GPT-5.5 vs Claude Opus 4.7: Model Guide</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Sat, 25 Apr 2026 03:34:22 +0000</pubDate>
      <link>https://forem.com/max_quimby/deepseek-v4-vs-gpt-55-vs-claude-opus-47-model-guide-29nb</link>
      <guid>https://forem.com/max_quimby/deepseek-v4-vs-gpt-55-vs-claude-opus-47-model-guide-29nb</guid>
      <description>&lt;p&gt;Today is the most chaotic single day in the 2026 AI model race.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/deepseek-v4-vs-gpt-55-vs-claude-opus-47-model-comparison-2026" rel="noopener noreferrer"&gt;Read the full version with charts and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Within a 24-hour window, OpenAI shipped &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;GPT-5.5&lt;/a&gt; — its most capable API model yet, with a 74% long-context score that doubles its predecessor — and DeepSeek responded within hours with two open-source models: &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" rel="noopener noreferrer"&gt;V4-Pro&lt;/a&gt; (1.6 trillion parameters, MIT license) and V4-Flash (284 billion, equally open). Claude Opus 4.7, which launched April 16, has been the dominant coding model since. Now it has two new challengers on the same day.&lt;/p&gt;

&lt;p&gt;The timing is not a coincidence. It rarely is at this level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/deepseek_ai/status/2047516922263285776" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ct6wmjta94mewzsz3lx.png" alt="DeepSeek V4 launch tweet — 36.2K likes, 8.4K RTs" width="548" height="663"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For developers, this isn't an academic benchmark exercise. The question is practical: &lt;strong&gt;for your next sprint, which model do you route each task to?&lt;/strong&gt; This guide gives you the data and the decision framework.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ Read our &lt;a href="https://computeleap.com/blog/gpt-5-5-vs-claude-code-agentic-coding-ai-2026" rel="noopener noreferrer"&gt;GPT-5.5 vs Claude Code deep dive&lt;/a&gt; for the coding-specific head-to-head from yesterday's launch. This article covers the broader model selection question.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Just Dropped: The Models at a Glance
&lt;/h2&gt;

&lt;p&gt;Before comparing, here's what we're actually comparing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;DeepSeek V4-Pro&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;DeepSeek V4-Flash&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;GPT-5.5&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Claude Opus 4.7&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total params&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.6T&lt;/td&gt;
&lt;td&gt;284B&lt;/td&gt;
&lt;td&gt;Undisclosed&lt;/td&gt;
&lt;td&gt;Undisclosed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Active params&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;49B&lt;/td&gt;
&lt;td&gt;13B&lt;/td&gt;
&lt;td&gt;Undisclosed&lt;/td&gt;
&lt;td&gt;Undisclosed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context window&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1M tokens&lt;/td&gt;
&lt;td&gt;1M tokens&lt;/td&gt;
&lt;td&gt;1M tokens&lt;/td&gt;
&lt;td&gt;200K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MIT (open weights)&lt;/td&gt;
&lt;td&gt;MIT (open weights)&lt;/td&gt;
&lt;td&gt;Proprietary&lt;/td&gt;
&lt;td&gt;Proprietary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Input cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1.74/M&lt;/td&gt;
&lt;td&gt;$0.14/M&lt;/td&gt;
&lt;td&gt;$5.00/M&lt;/td&gt;
&lt;td&gt;$5.00/M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$3.48/M&lt;/td&gt;
&lt;td&gt;$0.28/M&lt;/td&gt;
&lt;td&gt;$30.00/M&lt;/td&gt;
&lt;td&gt;$25.00/M&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Self-hostable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The architecture story behind these numbers: DeepSeek uses Mixture-of-Experts (MoE), which is why a 1.6 trillion parameter model only activates 49 billion parameters per token. &lt;a href="https://simonwillison.net/2026/apr/24/deepseek-v4/" rel="noopener noreferrer"&gt;Simon Willison notes&lt;/a&gt; that V4-Flash achieves only 10% of the single-token FLOPs and 7% of the KV cache size of its predecessor — that's what enables the aggressive pricing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/sama/status/2047787124846653895" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fryrnjez3lrg31xtatnzy.png" alt="Sam Altman announcing GPT-5.5 and GPT-5.5 Pro now available in the API" width="548" height="223"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ DeepSeek V4 runs entirely on Huawei chips with zero CUDA dependency. This matters beyond hardware specs: the inference pipeline isn't subject to US export control disruption, a meaningful consideration for enterprise planning.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Benchmark Breakdown: Who Wins Where
&lt;/h2&gt;

&lt;p&gt;Raw benchmark numbers are imperfect, but they're what we have. Here's the honest picture:&lt;/p&gt;

&lt;h3&gt;
  
  
  Intelligence Index (Artificial Analysis)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GPT-5.5: &lt;strong&gt;60 points&lt;/strong&gt; (&lt;a href="https://artificialanalysis.ai/models/comparisons/deepseek-v4-pro-high-vs-gpt-5-5" rel="noopener noreferrer"&gt;top score&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Claude Opus 4.7: &lt;strong&gt;57 points&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;DeepSeek V4-Pro: competitive, positioned between the two above&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GPT-5.5 leads the overall intelligence index — but three points over Claude Opus 4.7 is a margin unlikely to be decisive in most production workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Coding (SWE-Bench Verified)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Claude Opus 4.7: &lt;strong&gt;87.6%&lt;/strong&gt; (+6.8 points over Opus 4.6)&lt;/li&gt;
&lt;li&gt;DeepSeek V4-Pro: &lt;strong&gt;80.6%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;GPT-5.5 Pro: &lt;strong&gt;58.6%&lt;/strong&gt; (notably behind)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 For pure coding tasks, Claude Opus 4.7 holds the highest verified score at 87.6% on SWE-bench. DeepSeek V4-Pro is competitive at 80.6%. GPT-5.5 Pro trails at 58.6% — a surprising gap given its overall intelligence lead.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;DeepSeek also leads on terminal-level coding: V4-Pro scores 67.9% on Terminal-Bench 2.0 vs Claude at 65.4%. These are close enough that real-world workload matters more than the benchmark gap.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hard Reasoning (Humanity's Last Exam, no tools)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Claude Opus 4.7: &lt;strong&gt;46.9%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;GPT-5.5 Pro: &lt;strong&gt;43.1%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;GPT-5.5: &lt;strong&gt;41.4%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;DeepSeek V4-Pro: &lt;strong&gt;37.7%&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the most revealing split. For tasks requiring genuine hard reasoning — the kind where neither model has a template to pattern-match from — Claude Opus 4.7 leads by 9 points over DeepSeek. That's a meaningful gap for legal, financial analysis, or complex research workloads. (&lt;a href="https://fundaai.substack.com/p/deepdeepseek-v4-vs-claude-vs-gpt" rel="noopener noreferrer"&gt;FundaAI 38-task benchmark&lt;/a&gt;)&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ GPT-5.5's documented 86% hallucination rate — per &lt;a href="https://the-decoder.com/gpt-5-5-tops-benchmarks-but-still-hallucinates-frequently-and-costs-20-percent-more-over-the-api/" rel="noopener noreferrer"&gt;The Decoder's independent testing&lt;/a&gt; — is a significant weakness despite its top intelligence index score. For factual grounding, Claude Opus 4.7 or DeepSeek V4-Pro are more reliable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Long-Context Reasoning (MRCR v2 at 1M tokens)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GPT-5.5: &lt;strong&gt;74.0%&lt;/strong&gt; (up from 36.6% in GPT-5.4 — an extraordinary jump)&lt;/li&gt;
&lt;li&gt;Claude Opus 4.7: strong, but only supports 200K native context&lt;/li&gt;
&lt;li&gt;DeepSeek V4-Pro: 1M context native, performance data pending&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The GPT-5.5 long-context improvement is the headline technical achievement of this launch. If your workload involves very long document processing, GPT-5.5's long-context reasoning may be worth the price premium.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47879092" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztpf1mg56pclkiy9ixto.png" alt="HN thread: GPT-5.5 — 1,493 points on launch day" width="800" height="622"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Cost Math
&lt;/h2&gt;

&lt;p&gt;This is where DeepSeek V4 becomes genuinely disruptive. Let's make the numbers concrete for a typical development team.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Assume 100M output tokens per month&lt;/strong&gt; (a moderately active team with LLM-intensive workflows):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Monthly Output Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5 Pro&lt;/td&gt;
&lt;td&gt;$18,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;$3,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;$2,500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4-Pro&lt;/td&gt;
&lt;td&gt;$348&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4-Flash&lt;/td&gt;
&lt;td&gt;$28&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;DeepSeek V4-Pro at $348 vs Claude Opus 4.7 at $2,500 is a &lt;strong&gt;7× cost difference&lt;/strong&gt; for near-comparable coding performance. V4-Pro's output cost versus &lt;a href="https://decrypt.co/365455/deepseek-v4-launch-pro-version-costs-less-gpt-5-pro" rel="noopener noreferrer"&gt;GPT-5.5 Pro is a 98% reduction&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For teams already running on a budget, we covered a similar cost calculus in &lt;a href="https://computeleap.com/blog/kimi-k2-6-vs-claude-opus-47-open-source-chinese-ai-model-comparison-2026" rel="noopener noreferrer"&gt;our Kimi K2.6 vs Claude Opus 4.7 comparison&lt;/a&gt; — the pattern of Chinese open-source models delivering 80–90% of the capability at a fraction of the cost is now a structural feature of the AI market, not an anomaly.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ With cached input, the gap widens further. DeepSeek-V4-Pro's cache-hit cost is roughly one-tenth of GPT-5.5 and one-eighth of Claude Opus 4.7 at scale. If your architecture reuses prompt prefixes, the savings compound aggressively.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The 1M Context Window: What It Actually Changes
&lt;/h2&gt;

&lt;p&gt;Both DeepSeek V4 models and GPT-5.5 ship with 1M token context windows. Claude Opus 4.7 caps at 200K. The practical implications:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What 1M tokens enables:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feeding an entire 500-page technical specification into a single prompt&lt;/li&gt;
&lt;li&gt;Six months of project documentation without chunking&lt;/li&gt;
&lt;li&gt;A full codebase (~750,000 words of active context)&lt;/li&gt;
&lt;li&gt;Multi-step agent workflows where the model retains chain-of-thought across 20+ tool calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DeepSeek V4 introduces "interleaved thinking" — full chain-of-thought retention across tool calls in agent workflows. This means a 20-step agent workflow doesn't suffer the amnesia-halfway-through problem that plagues most agentic pipelines.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://news.ycombinator.com/item?id=47884971" rel="noopener noreferrer"&gt;HN discussion at 1,588 points&lt;/a&gt; surfaced a key practical detail: DeepSeek's zero CUDA dependency makes it runnable in environments where Nvidia GPUs aren't available — relevant for enterprise deployments on private infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47884971" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkcefczw8ynf9gdhj3gel.png" alt="HN thread: DeepSeek V4 — 1,588 points, top story of the day" width="800" height="622"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Polymarket Divergence: Hype vs. Market Confidence
&lt;/h2&gt;

&lt;p&gt;Here's the contrarian data point most coverage will skip.&lt;/p&gt;

&lt;p&gt;The Polymarket market "DeepSeek V4 released by...?" had &lt;a href="https://polymarket.com/event/deepseek-v4-released-by-march-31" rel="noopener noreferrer"&gt;$2.4 million in trading volume&lt;/a&gt; and resolved at 100% — traders called the release date correctly. Developer enthusiasm is genuine.&lt;/p&gt;

&lt;p&gt;But Polymarket's "Best Chinese AI company 2026" market? &lt;strong&gt;DeepSeek sits at 3%.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That divergence — maximum developer excitement, minimal market confidence in DeepSeek as a &lt;em&gt;company&lt;/em&gt; — is worth sitting with. Some reasons the market might be right:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open-source models generate developer mindshare but not revenue&lt;/li&gt;
&lt;li&gt;DeepSeek's pricing is so aggressive it may be below sustainable margin&lt;/li&gt;
&lt;li&gt;US export restrictions on Nvidia GPUs create a hardware ceiling for scale&lt;/li&gt;
&lt;li&gt;Anthropic holds ~85% in the "best coding AI company" Polymarket market — consensus hasn't shifted despite DeepSeek's coding scores&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ The pattern is clear: open-source is the Chinese AI strategy for global developer mindshare while keeping closed models for domestic enterprise. Two major drops (DeepSeek V4 + Tencent Hy3 at 295B parameters) in one day is not a coincidence. (&lt;a href="https://x.com/TencentHunyuan/status/2047347774501634251" rel="noopener noreferrer"&gt;Tencent Hy3 launch&lt;/a&gt;)&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Decision Framework: Routing Logic by Task Type
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://fundaai.substack.com/p/deepdeepseek-v4-vs-claude-vs-gpt" rel="noopener noreferrer"&gt;FundaAI 38-task benchmark&lt;/a&gt; and the &lt;a href="https://artificialanalysis.ai/models/comparisons/deepseek-v4-pro-high-vs-gpt-5-5" rel="noopener noreferrer"&gt;Artificial Analysis comparison&lt;/a&gt; both land at the same conclusion: don't pick one model. Route.&lt;/p&gt;

&lt;h3&gt;
  
  
  Route to Claude Opus 4.7 when:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Hard reasoning is required (legal, financial, medical research)&lt;/li&gt;
&lt;li&gt;Code review or complex multi-file refactoring (87.6% SWE-bench)&lt;/li&gt;
&lt;li&gt;Citation accuracy matters — lowest hallucination rate&lt;/li&gt;
&lt;li&gt;Enterprise compliance rules out Chinese infrastructure&lt;/li&gt;
&lt;li&gt;You're already in Cursor (&lt;a href="https://x.com/cursor_ai/status/2044785960899236341" rel="noopener noreferrer"&gt;50% off right now&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Route to DeepSeek V4-Pro when:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Long-context analysis (1M token codebase ingestion, multi-document synthesis)&lt;/li&gt;
&lt;li&gt;Agentic workflows with 20+ steps (interleaved thinking retention)&lt;/li&gt;
&lt;li&gt;High-volume batch processing where cost is the constraint&lt;/li&gt;
&lt;li&gt;Self-hosting or private infrastructure is required&lt;/li&gt;
&lt;li&gt;You want open weights for fine-tuning on domain data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Route to DeepSeek V4-Flash when:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;High-frequency, lower-complexity tasks ($0.28/M output)&lt;/li&gt;
&lt;li&gt;First-pass triage or pre-processing before escalation to a stronger model&lt;/li&gt;
&lt;li&gt;Any use case where V4-Pro would work but volume makes cost prohibitive&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Route to GPT-5.5 when:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Extreme long-context reasoning (1M tokens, MRCR v2 at 74%)&lt;/li&gt;
&lt;li&gt;Agentic computer use tasks via OpenAI Codex&lt;/li&gt;
&lt;li&gt;Speed is a priority (GPT-5.5 Fast Mode: 1.5× faster tokens)&lt;/li&gt;
&lt;li&gt;Deep OpenAI ecosystem integration (ChatGPT, Codex)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Skip GPT-5.5 Pro unless:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You have specific enterprise contracts with OpenAI&lt;/li&gt;
&lt;li&gt;The $180/M output cost is justifiable for specialized, low-volume, high-stakes tasks&lt;/li&gt;
&lt;li&gt;The 58.6% SWE-bench score won't matter for your use case&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 The optimal architecture for most teams: route 60–70% of traffic to V4-Flash for high-volume/low-complexity tasks, escalate coding to Claude Opus 4.7, use GPT-5.5 for long-context document tasks. This pattern typically reduces costs 40–60% compared to running everything through a single frontier model.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Three models, three different bets on what matters:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Opus 4.7&lt;/strong&gt; is the best coding model at 87.6% SWE-bench verified. It leads on hard reasoning. It hallucinates least. It costs $25/M output tokens. For high-stakes code and reasoning work, it remains the default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DeepSeek V4-Pro&lt;/strong&gt; is 7× cheaper than Claude, open-source, and within 10 points on most coding benchmarks. The 9-point gap on Humanity's Last Exam matters for hard reasoning. For everything else, the cost case is compelling — especially with 1M context native and interleaved thinking for agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-5.5&lt;/strong&gt; wins the intelligence index at 60 points and made a massive long-context leap. But the 86% hallucination rate and $30/M output cost make it a niche choice: buy it when you specifically need that long-context reasoning, and verify outputs carefully.&lt;/p&gt;

&lt;p&gt;The frontier in 2026 is not a single model. It's a routing layer.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;For the coding-specific comparison, see &lt;a href="https://computeleap.com/blog/gpt-5-5-vs-claude-code-agentic-coding-ai-2026" rel="noopener noreferrer"&gt;GPT-5.5 vs Claude Code: Which AI Should You Use for Agentic Development?&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;For a deeper look at the Chinese open-source cost story, see &lt;a href="https://computeleap.com/blog/kimi-k2-6-vs-claude-opus-47-open-source-chinese-ai-model-comparison-2026" rel="noopener noreferrer"&gt;Kimi K2.6 vs Claude Opus 4.7: The 88% Cost Advantage&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/deepseek-v4-vs-gpt-55-vs-claude-opus-47-model-comparison-2026" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>deepseek</category>
      <category>developer</category>
    </item>
    <item>
      <title>Meta's Real Story Isn't the Layoffs. It's the Surveillance.</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Fri, 24 Apr 2026 23:47:16 +0000</pubDate>
      <link>https://forem.com/max_quimby/metas-real-story-isnt-the-layoffs-its-the-surveillance-22jm</link>
      <guid>https://forem.com/max_quimby/metas-real-story-isnt-the-layoffs-its-the-surveillance-22jm</guid>
      <description>&lt;p&gt;On April 23, Meta &lt;a href="https://www.bloomberg.com/news/articles/2026-04-23/meta-tells-staff-it-will-cut-10-of-jobs-in-push-for-efficiency" rel="noopener noreferrer"&gt;told 8,000 employees&lt;/a&gt; they would be walking out the door on May 20. The next morning, Microsoft &lt;a href="https://www.cnbc.com/2026/04/23/microsoft-plans-first-voluntary-retirement-program-for-us-employees.html" rel="noopener noreferrer"&gt;announced&lt;/a&gt; the first voluntary retirement program in its 51-year history — up to 8,750 people eligible. Six weeks earlier, Block &lt;a href="https://thehill.com/policy/technology/5758605-block-cash-app-square-parent-layoffs-ai/" rel="noopener noreferrer"&gt;gutted 40%&lt;/a&gt; of its workforce, citing its own internal AI agent. The headlines write themselves: "AI is eating the tech industry," "the labor crisis is here."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://www.computeleap.com/blog/meta-surveillance-tech-layoffs-2026/" rel="noopener noreferrer"&gt;Read the full version with charts and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the story everyone is telling. It's also the wrong one.&lt;/p&gt;

&lt;p&gt;The real story is happening inside Meta's own walls. The same week the company said goodbye to 8,000 people, it quietly began installing &lt;strong&gt;surveillance software on the computers of the employees it's keeping&lt;/strong&gt; — a program called the Model Capability Initiative (MCI) that records every keystroke, mouse movement, and periodic screenshot across Gmail, GitHub, Slack, and hundreds of other sites. The stated purpose: to train AI agents that can automate white-collar work. The logical endpoint: the employees generating the training data are, line by line, encoding the workflows that will replace the next cohort. That isn't a productivity tool. That is the first visible instance of &lt;strong&gt;enterprise-scale AI observability applied to employees&lt;/strong&gt; — and because Meta just normalized it, every Fortune 500 will pilot a version within 18 months.&lt;/p&gt;

&lt;p&gt;This piece is about the layoffs. But the layoffs are the distraction. The surveillance is the precedent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Layoff Wave — the Surface Story
&lt;/h2&gt;

&lt;p&gt;The numbers are real, and they are large. Meta's chief people officer Janelle Gale sent a Thursday memo informing staff the company would cut 10% of its workforce — roughly 8,000 people — and decline to fill another 6,000 open roles. "This is not an easy tradeoff," she wrote, "and it will mean letting go of people who have made meaningful contributions to Meta during their time here." &lt;a href="https://www.bloomberg.com/news/articles/2026-04-23/meta-tells-staff-it-will-cut-10-of-jobs-in-push-for-efficiency" rel="noopener noreferrer"&gt;Bloomberg was first to report&lt;/a&gt;; &lt;a href="https://www.cnbc.com/2026/04/23/meta-will-cut-10percent-of-workforce-as-it-pushes-more-into-ai.html" rel="noopener noreferrer"&gt;CNBC&lt;/a&gt;, Reuters, and the FT followed within hours. The cuts land May 20.&lt;/p&gt;

&lt;p&gt;Meta's 2026 capital expenditure guidance — &lt;strong&gt;$115 billion to $135 billion&lt;/strong&gt;, roughly double the $72 billion spent in 2025 — explains the accounting. CEO Mark Zuckerberg has told investors the company will spend whatever it takes to build "personal superintelligence." Paying for that at current margins means shedding people. The company's framing is that the capex is the investment and the layoffs are the offset. What's hidden behind this number is a structural bet: that the people being cut now will be replaced by software the remaining employees are being paid to help train.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/NTVE4TRkvT4"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Microsoft, on the same day, introduced something unprecedented in the company's history. Rather than traditional layoffs, Satya Nadella's team offered a &lt;a href="https://www.cnbc.com/2026/04/23/microsoft-plans-first-voluntary-retirement-program-for-us-employees.html" rel="noopener noreferrer"&gt;one-time voluntary retirement program&lt;/a&gt; to about 7% of U.S. employees — senior director level and below, whose years of employment plus age sum to at least 70. That's roughly 8,750 people eligible to walk with severance and extended healthcare. The financial engineering is elegant: the people most expensive to fire voluntarily raise their hands. &lt;a href="https://techcrunch.com/2026/04/23/microsoft-offers-buyout-for-up-to-7-of-u-s-employees/" rel="noopener noreferrer"&gt;TechCrunch&lt;/a&gt; and &lt;a href="https://www.geekwire.com/2026/microsoft-will-offer-voluntary-retirement-to-thousands-of-employees-in-a-first-for-tech-giant/" rel="noopener noreferrer"&gt;GeekWire&lt;/a&gt; both flagged this as a first for the 51-year-old company. What matters isn't that Microsoft is unusually generous — it's that the framing of "retirement" sidesteps the WARN Act, muddles the layoffs.fyi counter, and gives the company political cover on capex earnings calls. This is layoff optimization, not compassion.&lt;/p&gt;

&lt;p&gt;Block is the canary. In March, Jack Dorsey announced the company behind Square and Cash App would cut 4,000 people — roughly 40% of its workforce — and pointed directly at the company's internal AI agent, codename goose, as the reason. Goose had been in production internally for about 18 months; &lt;a href="https://fortune.com/2026/03/06/exclusive-block-cfo-ai-leaps-18-months-led-decision-slash-nearly-half-its-workforce/" rel="noopener noreferrer"&gt;Fortune's exclusive with Block's CFO&lt;/a&gt; detailed the leverage math. Then came the complication: within six weeks, as we documented in our analysis of &lt;a href="https://www.computeleap.com/blog/block-ai-revolution-builderbot-replacing-engineers-2026" rel="noopener noreferrer"&gt;Block's 40% layoff and its codename goose agent&lt;/a&gt;, technical leads began threatening to quit unless laid-off teammates were rehired. &lt;a href="https://www.humai.blog/jack-dorsey-fired-4-000-block-workers-for-ai-then-the-rehires-started/" rel="noopener noreferrer"&gt;HumAI's reporting&lt;/a&gt; shows Block has quietly rehired engineers — often at lower seniority and tighter comp. The lesson the rest of big tech is taking: cut fast, claim AI, rehire the critical third at a 30% discount.&lt;/p&gt;

&lt;p&gt;Zoom out and the macro number is staggering. Per &lt;a href="https://www.cnbc.com/2026/04/24/20k-job-cuts-at-meta-microsoft-raise-concern-of-ai-labor-crisis-.html" rel="noopener noreferrer"&gt;layoffs.fyi tallies reported by CNBC&lt;/a&gt;, &lt;strong&gt;more than 92,000 tech workers have been laid off in 2026 alone&lt;/strong&gt;, bringing the running total since 2020 to nearly 900,000. Amazon announced its widest layoff in company history earlier this quarter. Oracle, Snap, Disney — &lt;a href="https://tech.yahoo.com/general/article/tech-layoffs-2026-over-96000-employees-have-been-laid-off-this-year-across-oracle-amazon-meta-disney-snap-and-more-144545855.html" rel="noopener noreferrer"&gt;the list is 96,000 and climbing&lt;/a&gt;. Glassdoor's Employee Confidence Index shows the tech sector dropped 6.8 percentage points year-over-year in March, the largest drop in any industry. But this is the part everyone is already covering. Let's move to the part they aren't.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47879986" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fotgs68oud2lrilxsyfro.png" alt="Hacker News thread on Meta's 10% layoff announcement — 781 points, 829 comments, top comment framing the cut as capex-driven rather than AI-productivity-driven" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the HN thread for the Bloomberg story (781 points, 829 comments), the top comment from user &lt;code&gt;bandrami&lt;/code&gt; captured the actual dynamic:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 "This is interesting because it's a case of 'AI taking jobs' but not in the way people normally mean; these massive layoffs are happening not because AI is doing the work they used to do but because capex is sucking all of the operating money out of everywhere." — bandrami, HN&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hold that frame. Because the next section explains where the capex goes — and who pays for it with their keystrokes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Story Underneath — Meta's MCI
&lt;/h2&gt;

&lt;p&gt;Two days before Meta announced the 10% cut, &lt;a href="https://fortune.com/2026/04/21/meta-will-start-tracking-employees-screens-and-keystrokes-to-train-ai/" rel="noopener noreferrer"&gt;Fortune broke a different story&lt;/a&gt;: Meta is installing tracking software on every U.S. employee's work computer. The program is called the &lt;strong&gt;Model Capability Initiative (MCI)&lt;/strong&gt;, and it does three things. It records mouse movements and clicks. It logs keystrokes. And it periodically captures screenshots — all inside a set of "work apps and websites" that &lt;a href="https://www.cnbc.com/2026/04/22/meta-tracks-employee-usage-on-google-linkedin-ai-training-project.html" rel="noopener noreferrer"&gt;CNBC's reporting&lt;/a&gt; reveals includes Google, LinkedIn, Wikipedia, Microsoft's GitHub, Salesforce's Slack, Atlassian's Jira and Confluence, Meta's own Threads and Manus, Gmail, Visual Studio Code, and an internal tool called Metamate. Hundreds of sites in total.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.theregister.com/2026/04/22/meta_employee_surveillance_software/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk82dk5astvrr4vejpz8r.png" alt="The Register, April 22, 2026: 'Magnificent irony as Meta staff unhappy about running surveillance software on work PCs' — primary reporting on the Model Capability Initiative, citing Reuters, Business Insider, and the internal Bosworth memo" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The stated purpose is not productivity management. It is training data. A Meta spokesperson explained the logic in clean language to Fortune: "If we're building agents to help people complete everyday tasks using computers, our models need real examples of how people actually use them." CTO Andrew Bosworth went further in an internal memo &lt;a href="https://www.theregister.com/2026/04/22/meta_employee_surveillance_software/" rel="noopener noreferrer"&gt;reported by The Register&lt;/a&gt;: Meta envisions "a world where our agents primarily do the work and our role is to direct, review and help them improve." Read carefully. The people being monitored are the raw material for the agents that will reduce the need for their successors. One employee, anonymous to the BBC, used the word that has dominated every discussion of MCI since: "&lt;strong&gt;very dystopian&lt;/strong&gt;." &lt;a href="https://www.computing.co.uk/news/2026/very-dystopian-meta-to-track-employee-keystrokes-to-train-ai-systems" rel="noopener noreferrer"&gt;Computing.co.uk used the quote as their headline&lt;/a&gt;. Futurism summarized the company's position more bluntly: "&lt;a href="https://futurism.com/artificial-intelligence/meta-track-everything-workers-type-click-train-ai" rel="noopener noreferrer"&gt;Meta is saying the quiet part out loud.&lt;/a&gt;"&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/26Vf9mxV7so"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47851948" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbivh57qg5x7wc1qaglx7.png" alt="Hacker News thread: 'Meta to start capturing employee mouse movements, keystrokes for AI training' — practitioners weighing in on the legal and ethical contours" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47860961" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkyqdlrzxhb6skx5qw7iz.png" alt="Hacker News thread: 'Meta employees are up in arms over a mandatory program to train AI on their work' — practitioner reaction to the mandatory, no-opt-out nature of the MCI rollout" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The timing is the part nobody in Meta's PR shop can spin. Fortune's story landed on April 21. CNBC's followed April 22. The 10% layoff memo landed April 23. Two days apart. &lt;a href="https://www.techradar.com/pro/simply-by-doing-their-daily-work-meta-tracks-staff-activity-to-teach-ai-how-to-replace-them" rel="noopener noreferrer"&gt;TechRadar Pro ran the causal headline explicitly&lt;/a&gt;: "Meta is logging employees' keystrokes and screenshots to train AI agents — weeks before major layoffs." Under U.S. federal law, there is no opt-out. Workers at Meta's U.S. offices have no legal right to refuse the MCI agent on their machines. Tell someone they have six weeks until the WARN notice and then ask them to hand over their keystroke data with no opt-out — and then call it consent — and you have defined the outer edge of what "at will" means in 2026. &lt;a href="https://www.ai-supremacy.com/p/massive-layoffs-meta-surveillance-deepseek-v4-preview-ai-news-this-week" rel="noopener noreferrer"&gt;AI Supremacy's newsletter&lt;/a&gt;, which broke the broader narrative into public consciousness, put it tersely: "&lt;strong&gt;Meta not just spying with AI glasses, now data harvesting talented staff. No opt-out.&lt;/strong&gt;"&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.ai-supremacy.com/p/massive-layoffs-meta-surveillance-deepseek-v4-preview-ai-news-this-week" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23kqepfzpkss9fvwx59n.png" alt="AI Supremacy Substack post on April 24, 2026 — 'Massive Layoffs, Meta Surveillance' — framing the week's twin stories as a single narrative about AI-era labor and control" width="800" height="549"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The 18-Month Precedent
&lt;/h2&gt;

&lt;p&gt;Every enterprise-software procurement cycle I've watched over 20 years follows the same pattern. A FAANG normalizes a practice. A tier-two SaaS company builds a commercial version of it. The Fortune 500 starts piloting within 12 months. It becomes a standard line item inside 24.&lt;/p&gt;

&lt;p&gt;Meta just normalized AI-native employee observability at the scale of 75,000 users. The SaaS category that will emerge around this is already taking shape: products that record a granular stream of employee screen, keyboard, and application telemetry; pipe it through an LLM scoring layer for "workflow classification"; and feed it back into either a coaching agent or a direct-automation agent. Every CIO has received a pitch from at least one of these vendors in the last ninety days. Meta just gave all of them the reference customer they needed.&lt;/p&gt;

&lt;p&gt;The logic is not secret — it's what &lt;a href="https://www.computeleap.com/blog/ai-native-org-dorsey-vs-tang-dynasty" rel="noopener noreferrer"&gt;the AI-native org playbook Dorsey has been pitching&lt;/a&gt; has been missing. Dorsey's Block built goose and then cut 40% of staff. But Block didn't productionize employee telemetry collection to &lt;em&gt;train&lt;/em&gt; the next version of goose. Meta is. That is the leap. The companies downstream of Meta will not write research papers or send internal memos — they will simply deploy the product, usually rolled in under an existing "endpoint security" or "DLP" SKU where most employees won't notice until it's in the HR handbook.&lt;/p&gt;

&lt;p&gt;Here is the prediction: by Q4 2027, three to five of the top ten U.S. private employers will be running some version of MCI under a brand name sold by a Menlo Park-funded SaaS vendor. Disclosure will vary. Consent will be buried in a revised acceptable-use policy. The 18-month timeline is not a guess — it's the standard procurement gap between "FAANG reference customer" and "regulated enterprise rollout."&lt;/p&gt;

&lt;h2&gt;
  
  
  Contrarian Corner — Steelman First
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;The strongest counter-argument to the thesis above:&lt;/strong&gt; workplace monitoring is not new. Pinkerton detectives watched factory floors in 1892. Keystroke loggers have been commercial software since the 1990s. Every enterprise already collects endpoint telemetry for security and DLP. Employees consented when they signed the handbook. What Meta is doing is a UX improvement, not a category break. And anyway, as &lt;a href="https://fortune.com/2026/02/19/sam-altman-confirms-ai-washing-job-displacement-layoffs/" rel="noopener noreferrer"&gt;Sam Altman pointed out in February&lt;/a&gt;, companies are "AI washing" layoffs they'd have done anyway — so blaming the surveillance for the layoffs, or even the other way around, is narrative fiction. The real driver is capex reallocation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The rebuttal:&lt;/strong&gt; the steelman is right about Pinkertons and keyloggers, and it's right that AI washing is real — but it's wrong about scale and purpose. Security telemetry is collected to detect malicious &lt;em&gt;activity&lt;/em&gt; (data exfiltration, credential misuse). Performance management telemetry is collected to measure &lt;em&gt;output&lt;/em&gt; (tickets closed, calls handled). MCI is different. MCI collects &lt;em&gt;the process&lt;/em&gt; — the specific sequence of clicks and keystrokes a senior engineer uses to structure a code review, the phrasing a PM uses in a Slack thread, the order in which a designer opens Figma panels. That's not security. That's not performance. That's an &lt;strong&gt;apprenticeship in bulk&lt;/strong&gt;, extracted from people who were not told what the apprentice would be. Altman's AI-washing caveat applies to the layoff narrative — and we'll take it. It does not apply to the MCI narrative. The monitoring isn't about the layoffs. The monitoring is about what comes after them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Ethical Question — And It Is One
&lt;/h2&gt;

&lt;p&gt;It would be lazy to call this dystopian and stop there. The harder question: where is the line?&lt;/p&gt;

&lt;p&gt;There is a real distinction — one Yale law professor Ifeoma Ajunwa has been writing about for a decade — between monitoring that enforces a contract (you agreed to do X hours of work; we verify X hours happened) and monitoring that extracts value beyond the contract (we watch how you work and build an asset, owned entirely by us, that captures the transferable skill you spent a career developing). The first is controversial but defensible. The second has no settled legal or ethical framework in the U.S. — and because most U.S. states are at-will, no practical avenue for refusal.&lt;/p&gt;

&lt;p&gt;European workers have more ground. GDPR Article 88 gives member states authority to pass employment-specific data protection laws; most have. France's CNIL has already ruled that keystroke-level monitoring without a documented, proportionate business case violates the GDPR's "data minimization" principle. Germany's works councils can veto the deployment of tracking software outright. Meta's MCI would not, in its current U.S. form, pass a German BetrVG review. The company has been pointedly quiet about whether it will extend MCI outside the U.S., and the regulatory asymmetry is the reason.&lt;/p&gt;

&lt;p&gt;Inside the U.S., the landscape is a patchwork. &lt;a href="https://iapp.org/news/a/workplace-privacy-in-us-laws-and-policies" rel="noopener noreferrer"&gt;IAPP's summary&lt;/a&gt; lays it out: California's CCPA, as of January 1, 2026, requires employers to conduct risk assessments for processing personal-email content over company systems and for any automated processing used to infer job performance. New York requires written notice of electronic monitoring at hiring, posted in a conspicuous place. Illinois's BIPA requires informed written consent and strict data-handling for biometric data. Connecticut and Delaware have their own notice regimes. Fifteen more states have biometric legislation in committee. The federal backstop — the Electronic Communications Privacy Act and Stored Communications Act — permits monitoring "for legitimate business purposes," which has never been tested against the specific question of "training AI to replace the monitored worker." Someone will file that suit in 2026. If Meta is the defendant, the discovery alone will be brutal.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://reddit.com/r/technology/comments/1stq5fk/palantir_employees_are_starting_to_wonder_if/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1g8l1mjg4thc0s8zsg5j.png" alt="r/technology front-page thread April 23, 2026 — 'Palantir Employees Are Starting to Wonder if They're the Bad Guys' — 22,780 upvotes, 1,136 comments, signal of shifting sentiment inside surveillance-adjacent tech companies" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The social temperature is already shifting. A r/technology thread &lt;a href="https://reddit.com/r/technology/comments/1stq5fk/palantir_employees_are_starting_to_wonder_if/" rel="noopener noreferrer"&gt;asking whether Palantir employees are starting to wonder if they're the bad guys&lt;/a&gt; hit 22,780 upvotes and 1,136 comments in a day. 30,000 Samsung union members took to the streets this week demanding a share of AI-driven profits. The broader &lt;a href="https://www.computeleap.com/blog/ai-backlash-violence-china-shift-2026" rel="noopener noreferrer"&gt;AI backlash that's already visible&lt;/a&gt; in public sentiment — and in specific acts of sabotage — will not spare a company that is simultaneously laying off 8,000 people and installing keystroke trackers on the survivors. The PR surface is maximum.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Employees Should Do — Right Now, This Week
&lt;/h2&gt;

&lt;p&gt;This is the practitioner section. None of what follows requires a lawyer, a union rep, or a grievance. Every single item is something a salaried tech worker can do this week with fifteen minutes and a personal laptop.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Assume your work device is instrumented.&lt;/strong&gt; Not just at Meta — at every tier-one tech employer within 18 months. Do not conduct job searches, update LinkedIn, or compose resume materials on a work machine. Do not route personal email through a work browser profile. Do not use work Slack or Teams for anything sensitive. If you're unsure whether endpoint monitoring is installed, check your company's acceptable-use policy and endpoint security agent list — you don't need IT's permission to read HR's own documents.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Know your jurisdiction.&lt;/strong&gt; Californians under CCPA have data-subject rights including access and deletion requests — use them. New Yorkers can demand the monitoring disclosure that state law requires was given at hiring. Illinois employees with any biometric capture (many keystroke loggers qualify) are protected by BIPA and can sue individually. If you're in the EU under GDPR, your employer is already on weaker legal ground than they realize. Look up your state attorney general's consumer protection page this week. Bookmark it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Document performance on personal storage.&lt;/strong&gt; Copies of performance reviews, 1:1 notes, project outcomes, praise emails, and compensation history — kept in a personal cloud account you still control after a termination. Not a work machine. Not a work-synced OneDrive. If you are fired and want to contest it or negotiate severance, you will need evidence your employer no longer grants you access to.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Talk to a lawyer before you need one.&lt;/strong&gt; Most employment-law firms offer &lt;strong&gt;free 30-minute consults&lt;/strong&gt;. Use one. Ask three questions: (a) what does my employment contract allow around monitoring and post-termination data collection, (b) does my state have any notice or consent laws that apply, (c) if I negotiate a severance, what is a typical multiplier in my jurisdiction and role. You are not hiring a lawyer. You are getting a calibration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use LinkedIn and Blind with discipline.&lt;/strong&gt; Never post job-search activity under a handle linked to your work email. Never cross-post on Blind from a device on the corporate network — Blind's "verified employer" check doesn't mean Blind itself is safe from discovery in litigation. If you are contemplating a move, set up a personal-email Blind account on a personal device today.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Know your collective-action options.&lt;/strong&gt; Most U.S. tech workers are non-union, but the NLRB protects concerted activity even without a union. Two or more employees raising monitoring concerns in writing to HR is protected. If the topic feels too hot for email, CODE-CWA and the TechWorkersCoalition both run confidential channels.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this is paranoia. It is hygiene. Your great-grandparents knew not to discuss wages or organizing plans in the company town's general store. The same discipline applies when the general store now runs on the laptop in your bag.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Employers Should Do
&lt;/h2&gt;

&lt;p&gt;Shorter version. Four items. If you are building or approving a monitoring program, answer each in writing before you deploy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Transparency.&lt;/strong&gt; Tell employees what is collected, how it is processed, who has access, how long it is retained, and what it will be used for. Not in an updated AUP. In an email, plus a town hall, plus a written Q&amp;amp;A that is revised based on the questions asked.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Consent with a real opt-out.&lt;/strong&gt; If there is no opt-out, it is not consent. If the only opt-out is resignation, it is not consent. Build a workflow that allows employees to exclude specific apps or time windows from collection, with no retaliation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Narrow scope and retention.&lt;/strong&gt; Collect the minimum required for the stated purpose. Delete the rest on a 90-day rolling window. Publish the retention schedule.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Independent audit.&lt;/strong&gt; Annual third-party review of what was collected, how it was used, and whether the stated purpose and the actual purpose match. Publish the audit summary internally.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Companies that do these four things will retain talent. Companies that don't will face union drives, class actions, and a steady bleed of senior engineers to competitors inside 24 months. The math is not hard.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Comes Next
&lt;/h2&gt;

&lt;p&gt;Eighteen months is the window. By late 2027 we will know whether enterprise AI employee-observability splits into two markets — a disclosure-first one sold to companies that care about retention, and a dark-patterns one sold to companies that will be defendants in the 2028 class-action docket. The companies that land on the right side of that split will not be the ones with the most advanced surveillance stack. They will be the ones that wrote the consent architecture first.&lt;/p&gt;

&lt;p&gt;The Meta layoffs are the headline. They're a footnote. The headline is what Meta is doing to the 72,000 employees it didn't lay off this week. They're being asked to train the thing that replaces the next 8,000 people. And because of where Meta sits in the procurement food chain, every CIO at every mid-cap company in America just saw the playbook. They will run it, with minor modifications, and with far less press attention.&lt;/p&gt;

&lt;p&gt;If you're a tech employee, the next year is not about whether your company survives the AI wave. It's about whether you can tell the difference between a performance review and a training run. Assume you're being watched. Make it worth their while.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.computeleap.com/blog/meta-surveillance-tech-layoffs-2026/" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tech</category>
      <category>surveillance</category>
      <category>career</category>
    </item>
    <item>
      <title>Shannon AI Review: Autonomous Web Pentesting Agent</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Fri, 24 Apr 2026 04:25:16 +0000</pubDate>
      <link>https://forem.com/max_quimby/shannon-ai-review-autonomous-web-pentesting-agent-3jdi</link>
      <guid>https://forem.com/max_quimby/shannon-ai-review-autonomous-web-pentesting-agent-3jdi</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://agentconn.com/blog/shannon-ai-pentester-review-autonomous-web-security-2026" rel="noopener noreferrer"&gt;Read the full version with screenshots and embedded sources on AgentConn →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On April 22, 2026, the &lt;a href="https://news.ycombinator.com/item?id=47876043" rel="noopener noreferrer"&gt;Bitwarden CLI package was compromised&lt;/a&gt; and pushed to npm as version 2026.4.0. The malicious release was live for 19 hours. 334 users downloaded it before detection. Bitwarden is one of the most-audited, most-trusted password managers on the planet — and the attack was caught by community monitoring, not by the organization's own tooling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47876043" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fksg200fy1xty3fpxc884.png" alt="Hacker News: Bitwarden CLI compromised in Checkmarx supply chain campaign — 679 points, 337 comments" width="600" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the context in which &lt;a href="https://github.com/KeygraphHQ/shannon" rel="noopener noreferrer"&gt;Shannon&lt;/a&gt; needs to be evaluated — not as an academic security toy, but as a response to an increasingly hostile environment where the traditional model of "annual pentest, quarterly audit" is already obsolete before the PDF is delivered.&lt;/p&gt;

&lt;p&gt;Shannon is an open-source autonomous AI pentesting agent built by &lt;a href="https://keygraph.io/shannon" rel="noopener noreferrer"&gt;Keygraph&lt;/a&gt;. It reads your source code, maps your attack surface, and attempts to break in — producing a report with zero false positives, because it only files findings it can actively prove with a working exploit. It has 40.1K GitHub stars as of April 2026. Powered by Anthropic's Claude.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/The_Cyber_News/status/2019777360313434478" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frddtlmdi1zutukguj1z5.png" alt="@The_Cyber_News: Shannon AI Pentesting Tool that Autonomously Checks for Code Vulnerabilities in 90 Minutes" width="548" height="920"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Shannon Actually Does
&lt;/h2&gt;

&lt;p&gt;When you run Shannon, it executes a five-phase workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pre-reconnaissance&lt;/strong&gt; — Static code analysis: architecture patterns, entry points, authentication mechanisms, likely attack vectors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reconnaissance&lt;/strong&gt; — Dynamic analysis via Playwright browser automation: forms, API endpoints, authentication flows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vulnerability &amp;amp; Exploitation&lt;/strong&gt; — Five parallel Claude agents simultaneously test for SQLi, XSS, authorization bypasses, SSRF, and IDOR. No PoC = no finding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confirmation&lt;/strong&gt; — Dedicated pass verifies each exploit is reproducible&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reporting&lt;/strong&gt; — Proven vulnerabilities only, with exact &lt;code&gt;curl&lt;/code&gt; commands to reproduce&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cost: ~$50 in Anthropic API credits. Time: 1–1.5 hours. Compare: $10,000–$50,000 for a traditional pentest.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/DavidBorish/status/2041171017029042465" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4ykpdasi1k6j8ay35tb.png" alt="@DavidBorish: Shannon hit 10,000 GitHub stars by actually breaking into web applications instead of just flagging potential problems" width="548" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The XBOW Benchmark: 96.15%
&lt;/h2&gt;

&lt;p&gt;Shannon scored 96.15% on the XBOW security benchmark — 100 of 104 intentionally vulnerable web apps solved in hint-free, source-aware mode. Commercial DAST tools typically score 30–40% on comparable evaluations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/AISecHub/status/2000413083693445600" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5un01bd5gie7t5376aqd.png" alt="@AISecHub: Shannon has achieved a 96.15% success rate on the hint-free source-aware XBOW Benchmark" width="548" height="764"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On Test Results
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DVNA (Node.js)&lt;/strong&gt; — Shannon detected SQL injection, command injection, XSS, and XXE with working exploits. "What stood out was how Shannon organized the analysis — it structured the findings into clear sections."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OWASP Juice Shop&lt;/strong&gt; — &lt;a href="https://betterstack.com/community/guides/ai/shannon-ai/" rel="noopener noreferrer"&gt;Better Stack's test&lt;/a&gt; consumed ~$60 in API credits. Shannon "didn't say 'this login looks weak' — it bypassed the login, dumped data, and handed me the screenshots and logs to prove it." Zero false positives.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Economics
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Frequency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Traditional pentest&lt;/td&gt;
&lt;td&gt;$10,000–$50,000&lt;/td&gt;
&lt;td&gt;Weeks&lt;/td&gt;
&lt;td&gt;Annual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shannon per scan&lt;/td&gt;
&lt;td&gt;~$50 API&lt;/td&gt;
&lt;td&gt;1–1.5 hours&lt;/td&gt;
&lt;td&gt;Daily in CI/CD&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What Shannon Misses
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;White-box only&lt;/strong&gt; — requires source code access; can't test closed-source dependencies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Four categories only&lt;/strong&gt; — SQLi, XSS, SSRF, broken auth. Business logic flaws: not in scope&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not for production&lt;/strong&gt; — creates users, modifies data, fires injection probes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM residual risk&lt;/strong&gt; — confirmation phase helps but human review still essential&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Dual-Use Concern
&lt;/h2&gt;

&lt;p&gt;From &lt;a href="https://news.ycombinator.com/item?id=46944416" rel="noopener noreferrer"&gt;HN discussion&lt;/a&gt;: "Since this is open source, it's a white-hat tool, but it also democratizes script kiddos being able to do some serious damage." Developer: "I guess who owns the most hardware wins the arms race?"&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Requirements: Docker, Node.js 18+, Anthropic API key&lt;/span&gt;
npx @keygraph/shannon setup
npx @keygraph/shannon start &lt;span class="nt"&gt;-u&lt;/span&gt; https://your-dev-app.com &lt;span class="nt"&gt;-r&lt;/span&gt; /path/to/repo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use Shannon if:&lt;/strong&gt; shifting security left, web app with source code you control, OWASP Top 10 exposure, need something between nothing and a full pentest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't rely on Shannon if:&lt;/strong&gt; black-box testing needed, business logic is your risk, compliance-ready reports required, production environment.&lt;/p&gt;

&lt;p&gt;Shannon is at &lt;a href="https://github.com/KeygraphHQ/shannon" rel="noopener noreferrer"&gt;github.com/KeygraphHQ/shannon&lt;/a&gt; — AGPL-3.0.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://agentconn.com/blog/shannon-ai-pentester-review-autonomous-web-security-2026" rel="noopener noreferrer"&gt;AgentConn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>opensource</category>
      <category>pentesting</category>
    </item>
    <item>
      <title>GPT-5.5 vs Claude Code: Which AI Should You Use?</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Fri, 24 Apr 2026 03:26:24 +0000</pubDate>
      <link>https://forem.com/max_quimby/gpt-55-vs-claude-code-which-ai-should-you-use-58fe</link>
      <guid>https://forem.com/max_quimby/gpt-55-vs-claude-code-which-ai-should-you-use-58fe</guid>
      <description>&lt;p&gt;The agentic coding race just got a whole lot more explicit.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/gpt-5-5-vs-claude-code-agentic-coding-ai-2026" rel="noopener noreferrer"&gt;Read the full version with charts and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On April 23, 2026, OpenAI shipped &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;GPT-5.5&lt;/a&gt; with a framing it hasn't used before: not a smarter chat model, but "a new class of intelligence for real work and powering agents." The subtext is unmistakable — OpenAI is coming directly for the territory Claude Code has been quietly dominating among professional developers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/OpenAI/status/2047376561205325845" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fondzunwp52ngnswtmi75.png" alt="OpenAI tweet announcing GPT-5.5 — 40K likes, 8.4K retweets" width="700" height="750"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The launch racked up 40K likes within hours. Developers who have been routing serious coding work through Claude Code are suddenly asking whether it's time to reconsider. The honest answer? It depends on what you're building — and who's paying for it.&lt;/p&gt;

&lt;p&gt;This is a practical decision guide. We'll cover the benchmark reality, the pricing drama that erupted this week, and the three distinct use cases where each tool wins. No hype, no both-sides-ism. Just a clear read on the current state of the agentic coding wars.&lt;/p&gt;

&lt;h2&gt;
  
  
  What GPT-5.5 Actually Is
&lt;/h2&gt;

&lt;p&gt;GPT-5.5 is the first fully retrained base model OpenAI has shipped since GPT-4.5. Every previous 5.x release (5.1, 5.2, 5.3, 5.4) was built on the same foundation — this one is not.&lt;/p&gt;

&lt;p&gt;The headline benchmark: &lt;strong&gt;82.7% on Terminal-Bench 2.0&lt;/strong&gt;, a test of complex command-line workflows that require planning, iteration, and coordinated tool use. It also posts 58.6% on SWE-Bench Pro (real GitHub issue resolution end-to-end in a single pass) and 84.9% on GDPval, which tests general-purpose knowledge work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://techcrunch.com/2026/04/23/openai-chatgpt-gpt-5-5-ai-model-superapp/" rel="noopener noreferrer"&gt;TechCrunch's coverage&lt;/a&gt; notes that Greg Brockman called it "a real step forward towards the kind of computing that we expect in the future" — pointing to autonomous task completion, not just chat fluency. The model is designed to use tools, verify its own work, and carry multi-step tasks through to completion without requiring constant human steering.&lt;/p&gt;

&lt;p&gt;What changed under the hood according to &lt;a href="https://interestingengineering.com/ai-robotics/opanai-gpt-5-5-agentic-coding-gains" rel="noopener noreferrer"&gt;Interesting Engineering&lt;/a&gt;: fewer refusals mid-task, better intent retention across long tool chains, and more efficient token usage per completed task than GPT-5.4. It's natively omnimodal (text, images, audio, video in a single unified system) and available in both ChatGPT and Codex immediately on launch day for Plus, Pro, Business, and Enterprise subscribers.&lt;/p&gt;

&lt;p&gt;The pricing is not gentle. &lt;a href="https://venturebeat.com/ai/openais-gpt-5-5-is-here-and-its-no-potato-narrowly-beats-anthropics-claude-mythos-preview-on-terminal-bench-2-0/" rel="noopener noreferrer"&gt;VentureBeat's analysis&lt;/a&gt; puts GPT-5.5 API at $5/million input tokens and $30/million output tokens — roughly 2x the per-token cost of GPT-5.4. OpenAI's defense is fewer tokens per task, but that tradeoff only holds if your workload actually benefits from GPT-5.5's strengths.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Claude Code Actually Is
&lt;/h2&gt;

&lt;p&gt;Claude Code is a different category of product. It's not a chat interface with coding capabilities bolted on — it's a terminal-native agent built specifically for software engineers. It runs in your local terminal, integrates directly with VS Code and JetBrains, understands your full repo context, and executes multi-hour autonomous coding sessions that Anthropic describes as its core use case.&lt;/p&gt;

&lt;p&gt;The underlying model powering serious Claude Code work today is &lt;strong&gt;Claude Opus 4.7&lt;/strong&gt;, released April 16, 2026. Its signature benchmark is &lt;strong&gt;64.3% on SWE-Bench Pro&lt;/strong&gt; — the highest score on that test for complex multi-file GitHub issue resolution. Opus 4.7 leads GPT-5.5 on 6 of the 10 shared benchmarks both providers report, particularly on the reasoning-heavy and code review-grade tests (GPQA Diamond, HLE, SWE-Bench Pro, MCP Atlas).&lt;/p&gt;

&lt;p&gt;For a ground-level look at how real developers are using it, the &lt;a href="https://www.youtube.com/watch?v=wkv2ifxPpF8" rel="noopener noreferrer"&gt;Y Combinator video featuring Garry Tan's Claude Code setup&lt;/a&gt; is worth 15 minutes. Tan walks through his "GStack" — the full Claude Code-native development environment he runs as a solo-founder-style operator.&lt;/p&gt;

&lt;p&gt;Claude Code's strongest differentiator isn't a benchmark. It's the depth of context retention and the autonomy of its execution. In the &lt;a href="https://news.ycombinator.com/item?id=47879092" rel="noopener noreferrer"&gt;Hacker News thread&lt;/a&gt; that followed GPT-5.5's launch, one recurring pattern emerged: developers described Claude Code as "autonomous/thoughtful — it plans deeply and asks less of the human," while Codex/GPT-5.5 is characterized as "an interactive collaborator where you steer it mid-execution."&lt;/p&gt;

&lt;p&gt;Check our &lt;a href="https://computeleap.com/blog/claude-code-complete-guide-2026" rel="noopener noreferrer"&gt;complete guide to Claude Code&lt;/a&gt; for a deep dive on how to set up and optimize Claude Code for your workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Head-to-Head: Benchmarks That Actually Matter
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://lushbinary.com/blog/gpt-5-5-vs-claude-opus-4-7-comparison-benchmarks-pricing/" rel="noopener noreferrer"&gt;Lushbinary's analysis&lt;/a&gt; of the 10 benchmarks both providers publicly report gives the clearest picture:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Opus 4.7 leads on 6:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SWE-Bench Pro: &lt;strong&gt;64.3%&lt;/strong&gt; vs 58.6%&lt;/li&gt;
&lt;li&gt;GPQA Diamond: Opus leads&lt;/li&gt;
&lt;li&gt;HLE (with and without tools): Opus leads&lt;/li&gt;
&lt;li&gt;MCP Atlas: Opus leads&lt;/li&gt;
&lt;li&gt;FinanceAgent v1.1: Opus leads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GPT-5.5 leads on 4:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terminal-Bench 2.0: &lt;strong&gt;82.7%&lt;/strong&gt; vs 69.4%&lt;/li&gt;
&lt;li&gt;BrowseComp: GPT-5.5 leads&lt;/li&gt;
&lt;li&gt;OSWorld-Verified: GPT-5.5 leads&lt;/li&gt;
&lt;li&gt;CyberGym: 82%&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ One important nuance: GPT-5.5's 58.6% on SWE-Bench Pro is measured in single-pass mode. Claude Code typically runs multiple iterations. Comparing single-pass GPT-5.5 scores to multi-pass Claude Code sessions is not apples-to-apples.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://x.com/omarsar0/status/2047424707310289058" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frw7dyhdrdmfmmfedem1f.png" alt="AI researcher first impressions of GPT-5.5 agentic capabilities" width="700" height="750"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47879092" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8yerdqx6jnmwu8iptc13.png" alt="Hacker News discussion on GPT-5.5 — developers compare Claude Code vs Codex workflows" width="700" height="750"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pricing Drama You Need to Know
&lt;/h2&gt;

&lt;p&gt;On April 22, &lt;a href="https://www.theregister.com/2026/04/22/anthropic_removes_claude_code_pro/" rel="noopener noreferrer"&gt;The Register reported&lt;/a&gt; that Anthropic quietly updated its pricing page — Claude Code showed an "X" in the Pro column, suggesting the feature was being moved exclusively to the $100/month and $200/month Max plans. No press release, no email, no changelog entry.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Reddit and HN caught fire immediately. For a large segment of Pro subscribers, Claude Code &lt;em&gt;was&lt;/em&gt; the reason they paid $20/month. The apparent removal felt like a retroactive bait-and-switch.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://www.theregister.com/2026/04/22/anthropic_removes_claude_code_pro/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faivgm4g3on4z8isb7ay7.png" alt="The Register coverage of Anthropic removing Claude Code from Pro plan" width="700" height="750"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://simonwillison.net/2026/apr/22/claude-code-confusion/" rel="noopener noreferrer"&gt;Simon Willison's take&lt;/a&gt; captured the confusion well: within hours of his blog post being drafted, Anthropic had reversed the pricing page change. Anthropic's Head of Growth Amol Avasare clarified the change affected "~2% of new prosumer signups" only.&lt;/p&gt;

&lt;p&gt;The contrast with Codex is stark. &lt;a href="https://www.builder.io/blog/codex-vs-claude-code" rel="noopener noreferrer"&gt;Builder.io's comparison&lt;/a&gt; makes it plain: "Many more people can live comfortably on the $20 Codex plan than Claude's $17 plan where limits get hit quickly."&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Decision Scenarios
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario 1: Solo Developer / Indie Hacker
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Winner: Claude Code&lt;/strong&gt; — with caveats on budget.&lt;/p&gt;

&lt;p&gt;If you're running a solo operation and want an AI that will autonomously execute multi-hour coding sessions while you focus on product decisions, Claude Code on Opus 4.7 is the deeper tool. The caveat: if you're on the $20 Pro plan and hitting limits regularly, GPT-5.5 in Codex is a legitimate alternative.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 2: Engineering Team (5–50 People)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Winner: GPT-5.5 / Codex&lt;/strong&gt; — on ecosystem and GitHub integration.&lt;/p&gt;

&lt;p&gt;For teams, &lt;a href="https://www.builder.io/blog/codex-vs-claude-code" rel="noopener noreferrer"&gt;Builder.io&lt;/a&gt; identifies Codex's GitHub integration as its decisive advantage. GPT-5.5 also supports the Agents.md standard — Claude Code's exclusive use of Claude.md creates friction in multi-tool team environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 3: Enterprise (100+ Engineers)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Winner: Hybrid + &lt;a href="https://github.com/farion1231/cc-switch" rel="noopener noreferrer"&gt;cc-switch&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At enterprise scale, the right answer is an intelligent routing layer. cc-switch (49K stars) unifies Claude Code, Codex, OpenCode, and Gemini CLI into a single Rust-powered desktop app.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 For enterprise teams: Claude Opus 4.7 for code review and complex refactors; GPT-5.5 for long-running agentic workflows and computer use. cc-switch makes this routing practical at scale.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use Claude Code (Opus 4.7) if:&lt;/strong&gt; complex multi-file coding, autonomous execution, terminal-native workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use GPT-5.5 / Codex if:&lt;/strong&gt; long-running tool chains, computer use, GitHub-centric team workflows, cost-sensitive setups.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use both (via cc-switch) if:&lt;/strong&gt; team or enterprise scale with mixed workloads.&lt;/p&gt;

&lt;p&gt;The developers winning with AI coding in 2026 stop asking "which is better overall?" and start asking "which is better for this specific task?"&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/gpt-5-5-vs-claude-code-agentic-coding-ai-2026" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>openai</category>
      <category>coding</category>
    </item>
    <item>
      <title>Claude Code Agentic Stack: cc-switch &amp; claude-context MCP</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Thu, 23 Apr 2026 03:36:41 +0000</pubDate>
      <link>https://forem.com/max_quimby/claude-code-agentic-stack-cc-switch-claude-context-mcp-1dg2</link>
      <guid>https://forem.com/max_quimby/claude-code-agentic-stack-cc-switch-claude-context-mcp-1dg2</guid>
      <description>&lt;p&gt;Claude Code just won a &lt;a href="https://www.webbyawards.com/press/press-releases/30th-annual-webby-awards-announce-2026-winners/" rel="noopener noreferrer"&gt;Webby Award&lt;/a&gt; for Best Product or Service in AI Features &amp;amp; Innovation. Boris Cherny, Claude Code's PM at Anthropic, &lt;a href="https://x.com/bcherny/status/2047004804283773321" rel="noopener noreferrer"&gt;announced the win on X&lt;/a&gt; to a wave of congratulations from the developer community:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/claude-code-agentic-dev-stack-2026" rel="noopener noreferrer"&gt;Read the full version with charts and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://x.com/bcherny/status/2047004804283773321" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcomputeleap.com" alt="" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But the real story isn't the trophy — it's what's happening in the GitHub repos trending alongside it.&lt;/p&gt;

&lt;p&gt;Two repos hit the GitHub Trending page on the same day as the Webby announcement: &lt;strong&gt;&lt;a href="https://github.com/farion1231/cc-switch" rel="noopener noreferrer"&gt;cc-switch&lt;/a&gt;&lt;/strong&gt; (+665 stars in 24 hours, 48,667 total) and &lt;strong&gt;&lt;a href="https://github.com/zilliztech/claude-context" rel="noopener noreferrer"&gt;claude-context&lt;/a&gt;&lt;/strong&gt; (+873 stars). Both extend Claude Code's capabilities significantly — and together with a properly configured &lt;code&gt;CLAUDE.md&lt;/code&gt;, they represent what serious agentic developer stacks look like in 2026.&lt;/p&gt;

&lt;p&gt;This guide covers exactly how to set up both tools and wire everything together for maximum development velocity.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the "Agentic Developer Stack" Actually Means in 2026
&lt;/h2&gt;

&lt;p&gt;In the 2026 context, an agentic developer stack has three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Provider management&lt;/strong&gt; — switch between Claude Code, Codex, Gemini CLI, OpenCode, and other AI coding tools from a single interface, sharing provider configs, MCP servers, and skills&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codebase context&lt;/strong&gt; — give your AI agent deep semantic understanding of your entire codebase, not just the files currently open&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent configuration&lt;/strong&gt; — the &lt;code&gt;CLAUDE.md&lt;/code&gt; files, skills, and subagent definitions that turn Claude Code from a general-purpose tool into a domain-specific engineering partner&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;According to &lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;Anthropic's 2026 Agentic Coding Trends Report&lt;/a&gt;, teams using structured &lt;code&gt;CLAUDE.md&lt;/code&gt; configs and subagent workflows report 2-4x velocity improvements over baseline Claude Code usage. The tools in this guide enable exactly that configuration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 1: cc-switch — Unified Provider Management
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What cc-switch Does
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/farion1231/cc-switch" rel="noopener noreferrer"&gt;cc-switch&lt;/a&gt; is a cross-platform desktop app built with Tauri and Rust that unifies management of five AI coding CLI tools: Claude Code, OpenAI Codex, Gemini CLI, OpenCode, and OpenClaw. Instead of maintaining separate configuration files and MCP server setups for each tool, cc-switch provides a single interface that syncs settings bidirectionally.&lt;/p&gt;

&lt;p&gt;Key features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;50+ built-in provider presets&lt;/strong&gt; — one-click import of API configurations for Anthropic, OpenAI, Gemini, xAI, Mistral, and more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System tray quick switch&lt;/strong&gt; — instant provider switching without opening a terminal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified MCP &amp;amp; Skills Management&lt;/strong&gt; — install MCP servers and skills once, sync across all four apps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud sync&lt;/strong&gt; — settings sync via Dropbox, OneDrive, iCloud, or WebDAV servers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Usage dashboard&lt;/strong&gt; — track spending, request counts, and token consumption per provider&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-platform&lt;/strong&gt; — Windows, macOS, and Linux support&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 cc-switch is built with Tauri (Rust-based) for native performance — not an Electron wrapper. Cold launch is under 200ms and system tray switching responds in under 50ms. This matters when you're switching between providers dozens of times a day.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Installing cc-switch
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;macOS:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cask&lt;/span&gt; cc-switch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or download the latest release from &lt;a href="https://github.com/farion1231/cc-switch/releases" rel="noopener noreferrer"&gt;cc-switch/releases&lt;/a&gt; — &lt;code&gt;.dmg&lt;/code&gt; for macOS, &lt;code&gt;.exe&lt;/code&gt; for Windows, &lt;code&gt;.AppImage&lt;/code&gt; for Linux.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/farion1231/cc-switch" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcomputeleap.com" alt="" width="" height=""&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cc-switch &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Initial Setup: Provider Configuration
&lt;/h3&gt;

&lt;p&gt;On first launch, cc-switch walks you through connecting your providers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open cc-switch from the system tray or Applications folder&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Providers&lt;/strong&gt; → &lt;strong&gt;Add Provider&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Select from the preset list (Anthropic, OpenAI, Gemini, etc.) or add a custom provider&lt;/li&gt;
&lt;li&gt;Paste your API key — cc-switch stores it in your OS keychain, not in plain text&lt;/li&gt;
&lt;li&gt;Test the connection with the &lt;strong&gt;Verify&lt;/strong&gt; button&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For Claude Code, cc-switch automatically detects your existing &lt;code&gt;~/.claude/&lt;/code&gt; configuration and imports it. Your existing settings, custom commands, and history are preserved.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up MCP Servers in cc-switch
&lt;/h3&gt;

&lt;p&gt;The real power of cc-switch is managing MCP servers across all your coding tools simultaneously. Instead of configuring the same MCP server four separate times, you configure it once and cc-switch deploys to all connected tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cc-switch mcp add &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"claude-context"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--command&lt;/span&gt; &lt;span class="s2"&gt;"npx"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--args&lt;/span&gt; &lt;span class="s2"&gt;"-y @zilliztech/claude-context"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; all-tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Layer 2: claude-context MCP — Semantic Codebase Search
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Codebase Context Is the Biggest Bottleneck
&lt;/h3&gt;

&lt;p&gt;When you ask Claude Code to modify a function that depends on types defined in five other files, Claude Code has to either load all five files into context (expensive) or try to infer the types from what it can see (error-prone). &lt;a href="https://github.com/zilliztech/claude-context" rel="noopener noreferrer"&gt;claude-context&lt;/a&gt; solves this with semantic search over your entire codebase.&lt;/p&gt;

&lt;p&gt;Instead of loading full files, it retrieves only the semantically relevant code snippets. According to &lt;a href="https://www.augmentcode.com/mcp/claude-context-mcp-server" rel="noopener noreferrer"&gt;Augment Code's MCP registry benchmarks&lt;/a&gt;, claude-context achieves approximately &lt;strong&gt;40% token reduction&lt;/strong&gt; under equivalent retrieval quality conditions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How claude-context Works
&lt;/h3&gt;

&lt;p&gt;claude-context uses a hybrid search approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BM25&lt;/strong&gt; — lexical matching (finds exact variable names, function signatures)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dense vector search&lt;/strong&gt; — semantic matching (finds conceptually related code even with different naming)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your codebase is indexed into a Milvus vector database (local) or Zilliz Cloud (managed). The index uses AST-aware chunking — it understands code structure at the syntax level. Function bodies, class definitions, and interface declarations are kept semantically intact.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 claude-context uses incremental Merkle-tree-based indexing. After the initial index build, only changed files are re-indexed. For a mid-size repo (50K LOC), re-indexing typically completes in under 5 seconds after a &lt;code&gt;git pull&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Installing and Configuring claude-context
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; Node.js 18+ and a running Milvus instance (local Docker) or &lt;a href="https://zilliz.com/" rel="noopener noreferrer"&gt;Zilliz Cloud&lt;/a&gt; account.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @zilliztech/claude-context
claude-context init   &lt;span class="c"&gt;# configure vector DB + embedding provider&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; claude-context index &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Register with Claude Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"claude-context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@zilliztech/claude-context"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"serve"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"MILVUS_URI"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:19530"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"EMBEDDING_PROVIDER"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"OPENAI_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${OPENAI_API_KEY}"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use cc-switch's MCP manager (recommended) — it handles the configuration and syncs it across all your AI coding tools automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=45181577" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcomputeleap.com" alt="" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Using claude-context During Development
&lt;/h3&gt;

&lt;p&gt;Once installed, claude-context adds a &lt;code&gt;search_codebase&lt;/code&gt; tool to Claude Code. You can invoke it explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Use the search_codebase tool to find all implementations of the PaymentProcessor interface before modifying it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or Claude Code will invoke it automatically when understanding more of the codebase would improve its response.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 For large monorepos, create a &lt;code&gt;.claude-context-ignore&lt;/code&gt; file (similar to &lt;code&gt;.gitignore&lt;/code&gt;) to exclude generated files, &lt;code&gt;node_modules&lt;/code&gt;, build artifacts, and test fixtures. This keeps the index clean and retrieval precise.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Layer 3: CLAUDE.md Configuration — Making It All Stick
&lt;/h2&gt;

&lt;p&gt;Having great tools is only half the equation. The other half is configuring Claude Code to use them intelligently. This is where &lt;code&gt;CLAUDE.md&lt;/code&gt; comes in — and where most developers leave significant productivity on the table.&lt;/p&gt;

&lt;p&gt;For the fundamentals, see our &lt;a href="https://computeleap.com/blog/claude-code-complete-guide-2026" rel="noopener noreferrer"&gt;Claude Code Complete Guide&lt;/a&gt;. This section focuses on configuration patterns specific to the 2026 agentic stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Role of CLAUDE.md in an Agentic Stack
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; is the document Claude Code reads at the start of every session. According to the &lt;a href="https://www.mindstudio.ai/blog/agentic-business-os-claude-code-architecture-guide" rel="noopener noreferrer"&gt;MindStudio guide on Agentic Business OS architecture&lt;/a&gt;, it's the "foundational document for your brand context layer — it defines what every agent knows before it starts any task."&lt;/p&gt;

&lt;p&gt;Use it to tell the agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which MCP servers are available and when to use them&lt;/li&gt;
&lt;li&gt;Your coding standards and conventions&lt;/li&gt;
&lt;li&gt;When to spawn subagents vs. work in the main context&lt;/li&gt;
&lt;li&gt;What tools to reach for first&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Sample CLAUDE.md for the 2026 Agentic Stack
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project: [Your Project Name]&lt;/span&gt;

&lt;span class="gu"&gt;## Stack&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Language: TypeScript 5.4 (strict mode)
&lt;span class="p"&gt;-&lt;/span&gt; Runtime: Node.js 22 LTS
&lt;span class="p"&gt;-&lt;/span&gt; Package manager: pnpm

&lt;span class="gu"&gt;## MCP Servers Available&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**claude-context**&lt;/span&gt;: Use &lt;span class="sb"&gt;`search_codebase`&lt;/span&gt; before modifying any class, interface, 
  or utility function that may have downstream consumers. Always search before refactoring.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**chrome-mcp**&lt;/span&gt;: Available for UI verification tasks.

&lt;span class="gu"&gt;## Coding Standards&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Functions: single responsibility, &amp;lt;=50 lines
&lt;span class="p"&gt;-&lt;/span&gt; No &lt;span class="sb"&gt;`any`&lt;/span&gt; types — use &lt;span class="sb"&gt;`unknown`&lt;/span&gt; + type guards
&lt;span class="p"&gt;-&lt;/span&gt; Tests: co-located &lt;span class="sb"&gt;`.test.ts`&lt;/span&gt; files, Vitest
&lt;span class="p"&gt;-&lt;/span&gt; Commits: conventional commits format

&lt;span class="gu"&gt;## Subagent Rules&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Spawn a subagent (with worktree isolation) for: feature branches, large refactors, research
&lt;span class="p"&gt;-&lt;/span&gt; Keep the main context for: interactive debugging, short edits, Q&amp;amp;A

&lt;span class="gu"&gt;## Agent Workflow&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Search codebase (claude-context) before modifying shared code
&lt;span class="p"&gt;2.&lt;/span&gt; Write tests before implementation for new features
&lt;span class="p"&gt;3.&lt;/span&gt; Run &lt;span class="sb"&gt;`pnpm build`&lt;/span&gt; and &lt;span class="sb"&gt;`pnpm test`&lt;/span&gt; before committing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern — explicitly naming available MCP servers and when to use subagents — is what separates teams that get 2-4x velocity gains from teams that treat Claude Code as smart autocomplete.&lt;/p&gt;

&lt;p&gt;For detailed &lt;code&gt;CLAUDE.md&lt;/code&gt; patterns, see &lt;a href="https://computeleap.com/blog/karpathy-claude-md-template-skills-github-stars-viral" rel="noopener noreferrer"&gt;Karpathy's CLAUDE.md template analysis&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subagent Setup with Worktree Isolation
&lt;/h3&gt;

&lt;p&gt;For complex features requiring parallel workstreams, the &lt;a href="https://code.claude.com/docs/en/sub-agents" rel="noopener noreferrer"&gt;official subagent documentation&lt;/a&gt; provides the full setup. The key pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;feature-agent&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Use for implementing new features across multiple modules&lt;/span&gt;
&lt;span class="na"&gt;isolation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;worktree&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;read&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;edit&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;write&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;bash&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;search_codebase&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

You are a focused implementation agent. Use search_codebase to understand 
existing patterns before writing new code. Work in the isolated worktree.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Setting &lt;code&gt;isolation: worktree&lt;/code&gt; gives the subagent its own copy of the repository, preventing conflicts when multiple agents work in parallel. For more on this, see the &lt;a href="https://github.com/shanraisshan/claude-code-best-practice" rel="noopener noreferrer"&gt;Claude Code best practices guide&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pricing Context: What the Pro Plan Controversy Means for Your Setup
&lt;/h2&gt;

&lt;p&gt;On April 21, 2026, Anthropic briefly removed Claude Code from the $20/month Pro plan listing — prompting a 2,648-upvote Reddit thread and coverage in &lt;a href="https://www.theregister.com/2026/04/22/anthropic_removes_claude_code_pro/" rel="noopener noreferrer"&gt;The Register&lt;/a&gt; and &lt;a href="https://www.xda-developers.com/anthropic-cut-claude-code-new-pro-subscriptions/" rel="noopener noreferrer"&gt;XDA Developers&lt;/a&gt;. &lt;a href="https://simonwillison.net/2026/apr/22/claude-code-confusion/" rel="noopener noreferrer"&gt;Simon Willison's analysis&lt;/a&gt; described it as an "A/B test on ~2% of new prosumer signups." Anthropic reversed the change the same day — existing Pro and Max subscribers are not affected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.reddit.com/r/ClaudeAI/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcomputeleap.com" alt="" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But the incident reveals the underlying tension: Claude Code sessions with Claude Opus 4.7 run up to three times longer than on 4.6, and inference costs are escalating.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ If you're building agentic workflows with long Claude Code sessions, budget for the Max plan ($100/month for 5x). Agentic sessions — especially with subagents and frequent claude-context queries — consume context much faster than interactive sessions. Use cc-switch's usage dashboard to track token consumption and catch runaway workflows before they hit billing limits.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Full Stack Setup Sequence
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Install Claude Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @anthropic-ai/claude-code
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Install cc-switch:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cask&lt;/span&gt; cc-switch
&lt;span class="c"&gt;# Or: github.com/farion1231/cc-switch/releases&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Import your existing Claude Code config&lt;/strong&gt; — cc-switch auto-detects &lt;code&gt;~/.claude/&lt;/code&gt; on first launch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Install and configure claude-context:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @zilliztech/claude-context
claude-context init
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; claude-context index &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;5. Register claude-context MCP via cc-switch&lt;/strong&gt; → MCP → Add Server → scope: All Tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Write your CLAUDE.md&lt;/strong&gt; in your project root using the template above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Define subagents&lt;/strong&gt; in &lt;code&gt;.claude/agents/&lt;/code&gt; — start with a feature-agent and a research-agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. Test the full stack:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude
&lt;span class="c"&gt;# Ask: "Search the codebase for the authentication flow and explain it"&lt;/span&gt;
&lt;span class="c"&gt;# claude-context should invoke automatically&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's Next in the Ecosystem
&lt;/h2&gt;

&lt;p&gt;A few things worth watching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;cc-switch's cloud sync&lt;/strong&gt; is expanding to git-based sync, enabling team-wide provider config sharing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;claude-context's offline mode&lt;/strong&gt; (tracking in &lt;a href="https://github.com/zilliztech/claude-context/issues/162" rel="noopener noreferrer"&gt;Issue #162&lt;/a&gt;) would enable fully local indexing without an external vector database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Tool Search&lt;/strong&gt; (launched January 14, 2026) allows Claude Code to dynamically load tools into context when MCP servers have 50+ tools — reducing context pressure from large MCP setups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The underlying trend is clear: Claude Code has crossed from "developer tool" to "developer platform." The Webby Award is the cultural marker. The GitHub trending repos are the technical evidence. Setting up this stack today puts you in front of the curve.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;GitHub&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;cc-switch&lt;/td&gt;
&lt;td&gt;Unified provider + MCP management desktop app&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/farion1231/cc-switch" rel="noopener noreferrer"&gt;farion1231/cc-switch&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;claude-context&lt;/td&gt;
&lt;td&gt;Semantic codebase search MCP&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/zilliztech/claude-context" rel="noopener noreferrer"&gt;zilliztech/claude-context&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLAUDE.md&lt;/td&gt;
&lt;td&gt;Agent configuration and context file&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/shanraisshan/claude-code-best-practice" rel="noopener noreferrer"&gt;shanraisshan/claude-code-best-practice&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For the full Claude Code foundation, read the &lt;a href="https://computeleap.com/blog/claude-code-complete-guide-2026" rel="noopener noreferrer"&gt;Claude Code Complete Guide&lt;/a&gt;. For browser automation integration, see &lt;a href="https://computeleap.com/blog/chrome-built-in-mcp-server-native-mcp-v2-2026" rel="noopener noreferrer"&gt;Chrome's built-in MCP server guide&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Sources: &lt;a href="https://github.com/farion1231/cc-switch" rel="noopener noreferrer"&gt;cc-switch GitHub&lt;/a&gt; · &lt;a href="https://github.com/zilliztech/claude-context" rel="noopener noreferrer"&gt;claude-context GitHub&lt;/a&gt; · &lt;a href="https://www.webbyawards.com/press/press-releases/30th-annual-webby-awards-announce-2026-winners/" rel="noopener noreferrer"&gt;Webby Awards 2026&lt;/a&gt; · &lt;a href="https://simonwillison.net/2026/apr/22/claude-code-confusion/" rel="noopener noreferrer"&gt;Simon Willison&lt;/a&gt; · &lt;a href="https://www.theregister.com/2026/04/22/anthropic_removes_claude_code_pro/" rel="noopener noreferrer"&gt;The Register&lt;/a&gt; · &lt;a href="https://www.xda-developers.com/anthropic-cut-claude-code-new-pro-subscriptions/" rel="noopener noreferrer"&gt;XDA Developers&lt;/a&gt; · &lt;a href="https://code.claude.com/docs/en/sub-agents" rel="noopener noreferrer"&gt;Anthropic Subagent Docs&lt;/a&gt; · &lt;a href="https://www.mindstudio.ai/blog/agentic-business-os-claude-code-architecture-guide" rel="noopener noreferrer"&gt;MindStudio Agentic OS&lt;/a&gt; · &lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;Anthropic 2026 Agentic Coding Trends&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/claude-code-agentic-dev-stack-2026" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>mcp</category>
      <category>devtools</category>
      <category>ai</category>
    </item>
    <item>
      <title>Iran's Prediction Markets Tell Two Stories</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Tue, 21 Apr 2026 04:04:57 +0000</pubDate>
      <link>https://forem.com/max_quimby/irans-prediction-markets-tell-two-stories-4i29</link>
      <guid>https://forem.com/max_quimby/irans-prediction-markets-tell-two-stories-4i29</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://thearcofpower.com/blog/iran-prediction-markets-polymarket-insider-trading-ceasefire-2026" rel="noopener noreferrer"&gt;Read the full analysis with charts on The Arc of Power →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On April 7, 2026, as President Trump was preparing to announce a two-week ceasefire with Iran, more than fifty newly-created accounts on &lt;a href="https://polymarket.com/predictions/iran" rel="noopener noreferrer"&gt;Polymarket&lt;/a&gt; placed large, specific bets that the ceasefire would be announced that day. Minutes later, Trump made the announcement. The accounts profited approximately $600,000. Within 48 hours, the White House had sent internal emails warning staff not to place prediction market bets related to the Iran war.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Our thesis: prediction markets are a remarkably accurate signal for short-term diplomatic timing and a systematically poor signal for structural outcomes.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The $200 Million Experiment
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.newsweek.com/iran-ceasefire-wreaks-havoc-on-prediction-markets-11806355" rel="noopener noreferrer"&gt;Over $200 million has traded&lt;/a&gt; on Polymarket contracts related to Iran's ceasefire timing. Approximately $118 million was bet specifically on an April 7 deadline — the exact day Trump announced the ceasefire.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.cnn.com/2026/03/24/politics/iran-war-bets-prediction-markets" rel="noopener noreferrer"&gt;CNN reported&lt;/a&gt; that a single trader made nearly $1 million from well-timed Polymarket bets correctly predicting US and Israeli military actions against Iran since 2024. The &lt;a href="https://www.cnbc.com/2026/04/10/iran-war-prediction-markets-white-house.html" rel="noopener noreferrer"&gt;White House warned staff&lt;/a&gt; not to bet on Iran war outcomes. Two senators wrote the CFTC demanding investigation. The BETS OFF Act was introduced in Congress.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The regulatory framing matters:&lt;/strong&gt; The BETS OFF Act would prohibit contracts on "government actions, terrorism, war, assassination, and events where an individual knows or controls the outcome." The last clause is the tell — directed at people who influence outcomes, not just know about them.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What the Markets Actually Got Right
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The ceasefire timing markets were accurate.&lt;/strong&gt; April 7 was right. The probability curve leading into the announcement showed a sustained spike beginning roughly 6-8 hours before Trump spoke — consistent with information leakage through informal channels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The regime stability markets have been roughly accurate.&lt;/strong&gt; Polymarket currently prices an &lt;a href="https://polymarket.com/event/will-the-iranian-regime-fall-by-the-end-of-2026" rel="noopener noreferrer"&gt;80.5% probability against the Iranian regime falling before 2027&lt;/a&gt;. Despite Khamenei's assassination and ongoing protests, the IRGC's institutional structure has remained intact. The market's skepticism of regime collapse — maintained even as Western media ran "end of the regime" framings — has proven correct.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Markets Are Getting Wrong
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://polymarket.com/event/us-iran-nuclear-deal-by-june-30" rel="noopener noreferrer"&gt;67% probability&lt;/a&gt; of a nuclear deal by June 30 is where prediction markets hit the limits of their model.&lt;/p&gt;

&lt;p&gt;Current odds (April 20, 2026):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Market&lt;/th&gt;
&lt;th&gt;Odds&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Nuclear Deal by April 30&lt;/td&gt;
&lt;td&gt;36%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nuclear Deal by June 30&lt;/td&gt;
&lt;td&gt;67%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nuclear Deal before 2027&lt;/td&gt;
&lt;td&gt;59-61%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regime Falls before 2027&lt;/td&gt;
&lt;td&gt;19.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The problem: prediction markets aggregate the probability that &lt;em&gt;a deal happens&lt;/em&gt;, not the probability that a deal &lt;em&gt;resolves the underlying dispute&lt;/em&gt;. A deal that fails to address uranium enrichment infrastructure is not a deal. It is a delay.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/Kalshi/status/2043002302722654321" rel="noopener noreferrer"&gt;Kalshi surged to 61%&lt;/a&gt; on nuclear deal odds following Trump's April 13 statement that Iran wants a deal "badly." Markets correctly incorporated Trump's statement — but cannot distinguish between a statement made for domestic political effect and one reflecting genuine diplomatic progress.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The contrarian read on 67%:&lt;/strong&gt; The markets cannot price the difference between "a document is signed" and "the structural conditions for Iranian nuclear breakout capability are removed." These resolve identically in contract language but produce very different geopolitical outcomes.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Information Asymmetry Diagnostic
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://www.bloomberg.com/news/articles/2026-04-08/polymarket-s-iran-bets-draw-fresh-disputes-and-insider-scrutiny" rel="noopener noreferrer"&gt;Bloomberg analysis&lt;/a&gt; identified two populations who could have generated the April 7 betting pattern:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Population 1: Informed analysts&lt;/strong&gt; — people who track backchannel communications through open-source methods and correctly model diplomatic decision-making. The Islamabad negotiations were not secret. A skilled analyst could have assessed April 7 as the most likely ceasefire date.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Population 2: Informed insiders&lt;/strong&gt; — people with access to non-public government information. The fifty new accounts make this the more plausible explanation for that specific cluster.&lt;/p&gt;

&lt;p&gt;The distinction matters: if it's Population 1, markets aggregate genuine analytical skill. If it's Population 2, markets track who has access to government communications. The signal quality is real in both cases — but for different reasons.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means
&lt;/h2&gt;

&lt;p&gt;Prediction markets on Iran are useful as a rough prior — a starting estimate before you apply your own analysis. They are not useful as a substitute for structural analysis.&lt;/p&gt;

&lt;p&gt;The 67% nuclear deal by June 30 tells you what the aggregate of informed and uninformed bettors believes will happen. It does not tell you whether the deal, if it happens, will matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to watch in the next 48 hours:&lt;/strong&gt; The ceasefire expires April 22. Watch whether the ceasefire extension contract price spike precedes or follows the official announcement. That timing will tell you more about information asymmetry in Iran prediction markets than any regulatory filing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://thearcofpower.com/blog/iran-prediction-markets-polymarket-insider-trading-ceasefire-2026" rel="noopener noreferrer"&gt;The Arc of Power&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>geopolitics</category>
      <category>analysis</category>
      <category>markets</category>
      <category>prediction</category>
    </item>
    <item>
      <title>Hermes Agent v0.10: Local AGI Stack &amp; Browser Guide</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Tue, 21 Apr 2026 03:53:49 +0000</pubDate>
      <link>https://forem.com/max_quimby/hermes-agent-v010-local-agi-stack-browser-guide-33bo</link>
      <guid>https://forem.com/max_quimby/hermes-agent-v010-local-agi-stack-browser-guide-33bo</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://agentconn.com/blog/hermes-agent-review-local-agi-stack-browser-integration-2026" rel="noopener noreferrer"&gt;Read the full version with diagrams and embedded sources on AgentConn →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In seven weeks, &lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;NousResearch/hermes-agent&lt;/a&gt; went from zero to 95,600 GitHub stars — the fastest star velocity of any agent framework in 2026. The question isn't whether Hermes Agent matters. The question is what v0.10.0 (released April 16, 2026) actually changes — and whether local deployment and browser integration are ready for production use.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's New in v0.10.0 (v2026.4.16)
&lt;/h2&gt;

&lt;p&gt;The v0.10 release is the most practically significant update for developers who want to run Hermes without API costs or need browser automation in their workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key additions in v0.10:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ollama integration&lt;/strong&gt; — First-class local model support via Ollama, llama.cpp, and vLLM with zero API cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;hermes-plugin-chrome-profiles&lt;/strong&gt; — Experimental Chrome CDP integration for multi-profile browser automation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser Use v0.8.0+&lt;/strong&gt; — Upgraded browser automation with better reliability and vision integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GEPA v2 improvements&lt;/strong&gt; — Faster evolution cycles for the self-improvement engine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Android/Termux support&lt;/strong&gt; — Hermes can now run natively on Android devices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The install story hasn't changed: one command, works everywhere.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Local Deployment: Ollama Integration in Practice
&lt;/h2&gt;

&lt;p&gt;The case for local Hermes is straightforward: if you're running a long-horizon autonomous task — a 2-hour coding session, a research crawl, a data pipeline — API costs compound fast. Switching to Ollama means the economics of "leave it running" change completely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardware Requirements
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.ollama.com/integrations/hermes" rel="noopener noreferrer"&gt;Official Ollama integration docs&lt;/a&gt; are specific about what local deployment requires:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;VRAM&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Apple Silicon (M2/M3/M4)&lt;/td&gt;
&lt;td&gt;Unified RAM (≥16GB)&lt;/td&gt;
&lt;td&gt;50-80 tok/s on 7B&lt;/td&gt;
&lt;td&gt;Metal acceleration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA GPU&lt;/td&gt;
&lt;td&gt;8-16GB VRAM+&lt;/td&gt;
&lt;td&gt;60-100+ tok/s on 7B&lt;/td&gt;
&lt;td&gt;CUDA via Ollama&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU-only&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;3-8 tok/s on 7B&lt;/td&gt;
&lt;td&gt;Usable, not recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The recommendation is a 7B or 13B model with 64K+ context window. Models with shorter contexts will truncate mid-task and produce inconsistent results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama first (if not already)&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama  &lt;span class="c"&gt;# macOS&lt;/span&gt;

&lt;span class="c"&gt;# Pull a compatible model (llama3.1 has 128K context natively)&lt;/span&gt;
ollama pull llama3.1:8b

&lt;span class="c"&gt;# Configure Hermes to use local model&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.hermes/config.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;'
llm:
  provider: ollama
  model: llama3.1:8b
  base_url: http://localhost:11434
  context_window: 65536
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Start Ollama server&lt;/span&gt;
ollama serve &amp;amp;

&lt;span class="c"&gt;# Run Hermes&lt;/span&gt;
hermes run &lt;span class="s2"&gt;"your task here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Context Window Constraint
&lt;/h3&gt;

&lt;p&gt;The critical gotcha: &lt;strong&gt;your model must support ≥64K context&lt;/strong&gt; for reliable multi-step tasks. Most quantized 7B models default to 4K or 8K context.&lt;/p&gt;

&lt;p&gt;Models confirmed to work well with local Hermes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;llama3.1:8b&lt;/code&gt; (128K context natively)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mistral:7b-instruct-q4_K_M&lt;/code&gt; (64K context with extended config)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;qwen2.5:14b&lt;/code&gt; (32K context, good for medium tasks)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;deepseek-coder-v2:16b&lt;/code&gt; (128K context, strong for coding tasks)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Browser Integration: CDP and Browser Use
&lt;/h2&gt;

&lt;p&gt;Hermes ships with two browser automation layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser Use v0.8.0+&lt;/strong&gt; is the default — high-level API for navigation, form filling, clicking, and vision-enabled page reading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;hermes-plugin-chrome-profiles&lt;/strong&gt; is the experimental CDP layer for multi-account workflows. It lets you connect to a running Chrome instance and switch between profiles programmatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Browser Use is bundled — just enable it in config&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.hermes/config.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;'
tools:
  browser:
    enabled: true
    provider: browser_use
    headless: false
    timeout: 30
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;hermes run &lt;span class="s2"&gt;"Research and summarize the top 5 HN posts from today, save to research-notes.md"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CDP plugin is useful for multi-account testing but not production-stable — &lt;a href="https://news.ycombinator.com/item?id=47726913" rel="noopener noreferrer"&gt;community reports&lt;/a&gt; of connection drops mid-task. Treat it as beta.&lt;/p&gt;




&lt;h2&gt;
  
  
  The GEPA Self-Improvement Engine
&lt;/h2&gt;

&lt;p&gt;GEPA (Genetic Evolution of Prompt Architectures) was presented as an &lt;a href="https://github.com/NousResearch/hermes-agent-self-evolution" rel="noopener noreferrer"&gt;ICLR 2026 Oral&lt;/a&gt;. The mechanism: GEPA reads execution traces, identifies failure patterns, and proposes improvements to skill prompts. Unlike simple retry logic, GEPA does causal analysis — it tries to understand &lt;em&gt;why&lt;/em&gt; something failed.&lt;/p&gt;

&lt;p&gt;The 40% speedup on repeat tasks is achievable, but accumulates over time. The first hour feels similar to any other agent. By hour two, after 15-20 similar tasks, the improvement becomes noticeable.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Self-Grading Problem
&lt;/h3&gt;

&lt;p&gt;Hermes's self-evaluation is optimistic. The workaround: explicit success criteria.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Instead of vague prompts:&lt;/span&gt;
hermes run &lt;span class="s2"&gt;"Fix the authentication bug in auth.py"&lt;/span&gt;

&lt;span class="c"&gt;# Use verifiable success criteria:&lt;/span&gt;
hermes run &lt;span class="s2"&gt;"Fix the authentication bug in auth.py.
Success criteria:
1. All tests in test_auth.py pass
2. Login endpoint returns 200 for valid credentials
3. Login endpoint returns 401 for invalid credentials
Run the tests and show output before marking complete."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Hermes vs Claude Code: Complementary, Not Competing
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://kilo.ai/articles/openclaw-vs-hermes-what-reddit-says" rel="noopener noreferrer"&gt;Community consensus on Reddit&lt;/a&gt;: these are complementary tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hermes excels at:&lt;/strong&gt; long-horizon orchestration, repetitive workflows, local deployment, multi-agent coordination, persistent memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code excels at:&lt;/strong&gt; deep intensive coding, complex architecture decisions, production-critical changes, interactive debugging.&lt;/p&gt;

&lt;p&gt;The practical pattern: Hermes runs background orchestration, calls Claude Code for intensive steps, accumulates skills from each cycle.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start Summary
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

&lt;span class="c"&gt;# Cloud API path&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ANTHROPIC_API_KEY=your-key"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.hermes/.env
hermes run &lt;span class="s2"&gt;"your first task"&lt;/span&gt;

&lt;span class="c"&gt;# Local Ollama path (zero cost)&lt;/span&gt;
ollama pull llama3.1:8b
hermes config &lt;span class="nb"&gt;set &lt;/span&gt;llm.provider ollama llm.model llama3.1:8b
hermes run &lt;span class="s2"&gt;"your first task"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;95,600 stars in seven weeks is an endorsement of the concept. v0.10 is the release where the execution starts catching up to the pitch.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://agentconn.com/blog/hermes-agent-review-local-agi-stack-browser-integration-2026" rel="noopener noreferrer"&gt;AgentConn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>productivity</category>
      <category>agents</category>
    </item>
    <item>
      <title>Kimi K2.6 vs Claude Opus 4.7: The 88% Cost Advantage</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Tue, 21 Apr 2026 03:28:38 +0000</pubDate>
      <link>https://forem.com/max_quimby/kimi-k26-vs-claude-opus-47-the-88-cost-advantage-2916</link>
      <guid>https://forem.com/max_quimby/kimi-k26-vs-claude-opus-47-the-88-cost-advantage-2916</guid>
      <description>&lt;p&gt;When Clement Delangue, the CEO of Hugging Face, called Kimi K2.6 a standout open-source model on the day of its release, the AI procurement conversation shifted. Not because a Chinese model was competitive — Kimi's K2 family and DeepSeek had already proved that point — but because of what &lt;em&gt;competitive&lt;/em&gt; now costs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/kimi-k2-6-vs-claude-opus-47-open-source-chinese-ai-model-comparison-2026" rel="noopener noreferrer"&gt;Read the full version with charts and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Kimi K2.6, the latest open-weight model from Beijing-based Moonshot AI, runs at &lt;strong&gt;$0.60 per million input tokens&lt;/strong&gt; on the official API. &lt;a href="https://openrouter.ai/anthropic/claude-opus-4.7" rel="noopener noreferrer"&gt;Claude Opus 4.7&lt;/a&gt;, Anthropic's frontier model, costs &lt;strong&gt;$5.00 per million input tokens&lt;/strong&gt;. That's an 8.3× difference — or roughly 88% cheaper.&lt;/p&gt;

&lt;p&gt;If your team spends $10,000 a month on Claude Opus 4.7 today, K2.6 could in theory handle the same workload for $1,200. Engineering teams are already running the math. This guide gives you the honest version of that calculation: where K2.6 delivers, where it doesn't, and how to make the decision without the hype in either direction.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture Behind the Price
&lt;/h2&gt;

&lt;p&gt;The reason Kimi K2.6 can be so cheap while performing at frontier level comes down to architecture. K2.6 is a &lt;strong&gt;Mixture-of-Experts (MoE) model&lt;/strong&gt;: it has 1 trillion total parameters but activates only 32 billion per token during inference.&lt;/p&gt;

&lt;p&gt;Dense models pay the full computational cost of every parameter on every token. MoE models route each token through a small subset of specialized "expert" subnetworks. The result is trillion-parameter model quality at a fraction of the inference cost — which flows directly to the API price.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faw5ae684sfat6snv8czy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faw5ae684sfat6snv8czy.jpg" alt="MoE architecture diagram showing how Kimi K2.6 routes tokens through 8 of 384 experts, activating only 32B of 1T total parameters" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;K2.6's MoE structure is unusually large-scale:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;384 expert subnetworks&lt;/strong&gt;, with 8 selected per token plus 1 shared expert&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;61 transformer layers&lt;/strong&gt; (including 1 dense layer)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-head Latent Attention (MLA)&lt;/strong&gt; mechanism for efficient long-context processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;256K token context window&lt;/strong&gt; — enough to process entire large codebases in a single prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MoonViT vision encoder&lt;/strong&gt; (400M parameters) for native multimodal input&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 256K context and 160K-token vocabulary round out a model that's clearly engineered for production coding workloads, not benchmark optimization.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ MoE models have a catch: they're harder to run locally. At 1T total parameters, K2.6 requires significant hardware even with 8-bit quantization. Community quantizations exist on HuggingFace (via unsloth and ubergarm), but self-hosted K2.6 is a serious infrastructure commitment. If local deployment is your goal, smaller Chinese open-source models may be more practical.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Benchmarks: Where K2.6 Actually Leads
&lt;/h2&gt;

&lt;p&gt;Benchmark theater is a real phenomenon in AI. But some numbers here are worth taking seriously because they map to real engineering workloads.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Kimi K2.6&lt;/th&gt;
&lt;th&gt;Claude Opus 4.7&lt;/th&gt;
&lt;th&gt;GPT-5.4&lt;/th&gt;
&lt;th&gt;Gemini 3.1 Pro&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;58.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;53.4&lt;/td&gt;
&lt;td&gt;57.7&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HLE Full w/ Tools&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;54.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;53.0&lt;/td&gt;
&lt;td&gt;52.1&lt;/td&gt;
&lt;td&gt;51.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BrowseComp&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;83.2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;82.7&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Verified&lt;/td&gt;
&lt;td&gt;80.2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;80.8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Input Price&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.60/M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$5.00/M&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Output Price&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$2.50/M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$25.00/M&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;SWE-Bench Pro&lt;/strong&gt; measures performance on real GitHub issues — actual engineering tasks, not constructed problems. K2.6's 58.6 vs Claude Opus 4.6's 53.4 is a meaningful gap on the metric that matters most to software teams.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HLE (Humanity's Last Exam) with Tools&lt;/strong&gt; is a research-grade exam specifically designed to resist AI memorization. K2.6 leads all frontier models at 54.0, placing above Claude Opus 4.6 (53.0) and GPT-5.4 (52.1). This is surprising for a model priced as a "budget" alternative.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ These benchmarks are from Moonshot AI's own release. Independent, third-party SWE-Bench Pro evaluations are still catching up. Take the K2.6-specific numbers with the usual caveat applied to vendor benchmarks — the HN community reception and Cursor integration are better early signals than the numbers alone.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Agent Swarm Capability
&lt;/h2&gt;

&lt;p&gt;Beyond raw benchmark scores, K2.6 introduces a capability that doesn't have an obvious analogue in Opus 4.7: &lt;strong&gt;agent swarm scaling&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;K2.6 can orchestrate up to &lt;strong&gt;300 sub-agents executing 4,000 coordinated steps&lt;/strong&gt; — decomposing a complex task into parallel, domain-specialized subtasks running simultaneously. According to &lt;a href="https://www.kimi.com/blog/kimi-k2-6" rel="noopener noreferrer"&gt;Moonshot's technical blog&lt;/a&gt;, real-world case studies include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Optimizing Zig inference performance from 15 to 193 tokens/second over a 12-hour autonomous run&lt;/li&gt;
&lt;li&gt;Overhauling a financial matching engine from 0.43 to 1.24 million transactions/second (185% improvement) over a 13-hour session&lt;/li&gt;
&lt;li&gt;Generating full-stack websites with databases from text-only prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A "Claw Groups" preview feature lets humans and agents collaborate in a shared operational space, with task-to-agent matching and failure detection. This positions K2.6 less as a chat model and more as an infrastructure primitive for long-horizon background workloads.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real Developer Reception: What the HN Thread Reveals
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://news.ycombinator.com/item?id=47835735" rel="noopener noreferrer"&gt;Kimi K2.6 Hacker News thread&lt;/a&gt; scored 592 points with 303 comments within hours of release — unusually strong engagement for a non-US model launch.&lt;/p&gt;

&lt;p&gt;The developer sentiment breaks roughly into thirds:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bullish:&lt;/strong&gt; "Dirt cheap on OpenRouter for how good it is" (regularfry). Simon Willison posted a live demo of K2.6 generating animated SVG HTML via OpenRouter, citing it as practical and fast. One commenter confirmed K2.6 &lt;strong&gt;powers Cursor's composer-2 model&lt;/strong&gt; — a real-world quality endorsement that's harder to fake than a benchmark.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skeptical:&lt;/strong&gt; "Tried it once... my experience was just okay-ish despite strong benchmarks." Some users report it "does only slightly better than Kimi K2.5" and "struggles with domain-specific tasks."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Philosophical:&lt;/strong&gt; "Funny that Chinese companies are pioneering possibly the world's most important tech via open source while the US goes closed" — a sentiment that lands differently when you consider DeepSeek R1, Qwen, and now K2.6 all dropped open weights.&lt;/p&gt;

&lt;p&gt;The median impression aligns with &lt;a href="https://benchlm.ai/compare/claude-opus-4-7-vs-kimi-k2-5" rel="noopener noreferrer"&gt;BenchLM's Claude Opus 4.7 vs Kimi K2.5 comparison&lt;/a&gt;: Claude leads overall (94 vs 68) with its sharpest advantage in agentic reliability. K2.6 closes that gap meaningfully, but the gap hasn't entirely closed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Qwen3.6-Max-Preview Context: Two Chinese Models in One Day
&lt;/h2&gt;

&lt;p&gt;K2.6 didn't land in isolation. On the same day — April 20, 2026 — Alibaba released &lt;a href="https://decrypt.co/364948/alibaba-qwen-3-6-max-preview-most-powerful-model" rel="noopener noreferrer"&gt;Qwen3.6-Max-Preview&lt;/a&gt;, topping six major coding benchmarks including SWE-benchPro, Terminal-Bench 2.0, SkillsBench, and SciCode.&lt;/p&gt;

&lt;p&gt;Qwen3.6-Max-Preview is proprietary (no open weights), but the convergence of two major Chinese AI releases on the same day is structurally significant. &lt;a href="https://importai.substack.com/p/import-ai-454-automating-alignment" rel="noopener noreferrer"&gt;Jack Clark's Import AI newsletter&lt;/a&gt; has tracked this arc: Chinese models are no longer "almost competitive" — they're trading leads on specific benchmarks with the frontier models from Anthropic, OpenAI, and Google.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://chinai.substack.com/p/chinai-291-chinese-open-source-models" rel="noopener noreferrer"&gt;ChinAI newsletter&lt;/a&gt; framed it earlier this year: "Chinese open-source models are now leading foreign open-source models and closing in on global first-tier closed-source models." April 20 is a data point, not an anomaly.&lt;/p&gt;

&lt;p&gt;If you've been following &lt;a href="https://computeleap.com/blog/qwen3-35b-a3b-local-mac-setup-lm-studio-open-source" rel="noopener noreferrer"&gt;our Qwen 3.5B local setup guide&lt;/a&gt;, K2.6 is the cloud-API counterpart to that story — optimized for different constraints but part of the same structural trend.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use Kimi K2.6
&lt;/h2&gt;

&lt;p&gt;K2.6 is the right choice when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Long-horizon coding tasks&lt;/strong&gt; — multi-hour autonomous runs on well-scoped engineering problems, where the agent swarm architecture pays off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-volume production workloads&lt;/strong&gt; — teams spending $5K+/month on Opus-level API calls where the 88% cost delta is real money&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-shot code generation&lt;/strong&gt; — initial code scaffolding, UI generation from design prompts, full-stack boilerplate where SWE-Bench Pro performance matters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent orchestration&lt;/strong&gt; — building multi-agent systems (see &lt;a href="https://computeleap.com/blog/openai-agents-python-tutorial-multi-agent-ai-workflows-2026" rel="noopener noreferrer"&gt;our OpenAI Agents Python SDK tutorial&lt;/a&gt; for framework context) where K2.6's 300-sub-agent ceiling gives headroom&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two-tier architectures&lt;/strong&gt; — using K2.6 for first-pass generation and Claude for final review/validation captures most of the cost savings without sacrificing output quality&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When Claude Opus 4.7 Is Still Worth the Premium
&lt;/h2&gt;

&lt;p&gt;Stick with Opus 4.7 when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complex reasoning under ambiguity&lt;/strong&gt; — open-ended problems where the model needs judgment, not execution; Claude's agentic reliability lead is real&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production workloads where errors are expensive&lt;/strong&gt; — if a wrong answer costs $10K to fix, the API call price is irrelevant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise compliance&lt;/strong&gt; — Anthropic's usage policies, data handling, and audit trails are more mature than Moonshot's at the enterprise procurement level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal tasks requiring judgment&lt;/strong&gt; — vision tasks that need contextual interpretation, not just image recognition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creative and long-form writing&lt;/strong&gt; — anecdotal but consistent: Claude's prose quality and editorial judgment remain ahead&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 The hybrid approach is underrated: use K2.6 for code generation and execution, Claude Opus 4.7 for planning and validation. Our &lt;a href="https://computeleap.com/blog/anthropic-vs-openai-api-developer-platform-2026" rel="noopener noreferrer"&gt;API cost comparison&lt;/a&gt; showed that most production AI spend is concentrated in generation volume — exactly where the K2.6 cost advantage is largest.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Accessing K2.6: Your Options
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Kimi.com API (direct):&lt;/strong&gt; &lt;code&gt;$0.60/M&lt;/code&gt; input, &lt;code&gt;$2.50/M&lt;/code&gt; output. Compatible with the OpenAI Python SDK via base URL swap — no code refactoring if you're already calling OpenAI-compatible endpoints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenRouter:&lt;/strong&gt; &lt;code&gt;$0.60/M&lt;/code&gt; input, &lt;code&gt;$2.80/M&lt;/code&gt; output (slight markup). Useful for routing alongside other models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-hosted:&lt;/strong&gt; Available on HuggingFace under Modified MIT license. Requires &lt;code&gt;transformers &amp;gt;=4.57.1&lt;/code&gt;. Recommended inference: vLLM or SGLang. Commercial restriction applies for entities with 100M+ MAU or $20M+ monthly revenue.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Drop-in replacement for OpenAI-compatible code
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-kimi-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.kimi.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kimi-k2.6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your prompt here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The OpenAI SDK compatibility is the practical win here — most teams can A/B test K2.6 against their current model with a one-line base URL change.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Kimi K2.6 is not a Claude Opus 4.7 replacement for all workloads. But for code generation at volume, long-horizon agent tasks, and cost-sensitive production workloads, K2.6 delivers at a price point that makes the tradeoffs genuinely favorable.&lt;/p&gt;

&lt;p&gt;The hidden cost of cheap models is real — we covered it &lt;a href="https://computeleap.com/blog/hidden-cost-cheap-ai-reasoning-models-2026" rel="noopener noreferrer"&gt;here&lt;/a&gt;. But the hidden cost of expensive models is also real: teams that overpay for capabilities they don't use, or avoid running AI on high-volume tasks because the math doesn't work. K2.6 makes more tasks economically viable, and that's worth something even if you keep Claude for the hard stuff.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick decision:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-volume coding generation → &lt;strong&gt;K2.6&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Complex reasoning, enterprise compliance, judgment-heavy tasks → &lt;strong&gt;Claude Opus 4.7&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Both → &lt;strong&gt;two-tier architecture&lt;/strong&gt; (K2.6 generates, Claude validates)&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/kimi-k2-6-vs-claude-opus-47-open-source-chinese-ai-model-comparison-2026" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>deer-flow vs evolver vs GenericAgent: Production-Ready?</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Mon, 20 Apr 2026 04:22:45 +0000</pubDate>
      <link>https://forem.com/max_quimby/deer-flow-vs-evolver-vs-genericagent-production-ready-33m6</link>
      <guid>https://forem.com/max_quimby/deer-flow-vs-evolver-vs-genericagent-production-ready-33m6</guid>
      <description>&lt;h1&gt;
  
  
  deer-flow vs evolver vs GenericAgent: Production-Ready?
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://agentconn.com/blog/self-evolving-ai-agents-evolver-genericagent-deerflow-comparison-2026" rel="noopener noreferrer"&gt;Read the full version with diagrams and embedded sources on AgentConn →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On April 19, 2026, three self-evolving agent frameworks landed simultaneously in GitHub's global top 10: &lt;a href="https://github.com/bytedance/deer-flow" rel="noopener noreferrer"&gt;bytedance/deer-flow&lt;/a&gt; at 62,800 stars, &lt;a href="https://github.com/EvoMap/evolver" rel="noopener noreferrer"&gt;EvoMap/evolver&lt;/a&gt; at 5,700 stars, and &lt;a href="https://github.com/lsdefine/GenericAgent" rel="noopener noreferrer"&gt;lsdefine/GenericAgent&lt;/a&gt; at 4,600 stars. That's not three projects trending. That's a category arriving.&lt;/p&gt;

&lt;p&gt;The timing matters. We've &lt;a href="https://agentconn.com/blog/self-evolving-ai-agents-genericagent-evomap-skill-trees-guide" rel="noopener noreferrer"&gt;already covered GenericAgent and EvoMap's skill-tree approaches&lt;/a&gt; in detail. What hasn't been covered is how they compare to deer-flow, which is by far the largest of the three — and how all three stack up on the question that actually matters for teams considering them: can you run this in production without it becoming a liability?&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Self-Evolving" Actually Means (And What It Doesn't)
&lt;/h2&gt;

&lt;p&gt;Before comparing frameworks, the clarification that saves everyone time: &lt;strong&gt;none of these systems modify their underlying model weights.&lt;/strong&gt; This is important because the marketing doesn't always make it clear.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2507.21046" rel="noopener noreferrer"&gt;The academic survey that anchors this category&lt;/a&gt; defines the feedback loop cleanly: agent executes a task → environment responds → optimizer extracts patterns → skill store is updated → next execution draws on those patterns. The agent improves over time not because the model gets smarter, but because the tools available to the model improve.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://news.ycombinator.com/item?id=44884091" rel="noopener noreferrer"&gt;Hacker News discussion&lt;/a&gt; put it plainly: "Self-improvement is really prompt/tool optimization, not weight updates." The skeptic position is correct if you're expecting AGI-style capability jumps. The practitioner position is also correct: process recursion — skill accumulation — is a genuine capability improvement, even if it's not the learning the term implies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=44884091" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fagentconn.com%2Fblog%2Fhn-self-evolving-agents-survey-94pts.png" alt="HN: A Comprehensive Survey of Self-Evolving AI Agents — 94 points, 29 comments" width="800" height="562"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With that framing established, here are the three frameworks.&lt;/p&gt;




&lt;h2&gt;
  
  
  deer-flow (ByteDance) — The SuperAgent Harness
&lt;/h2&gt;

&lt;p&gt;At 62,800 stars, deer-flow isn't just the largest self-evolving framework on GitHub — it's one of the largest agent frameworks period. It claimed #1 on GitHub Trending in February 2026 when version 2 launched, and crossed 60,000 stars within weeks.&lt;/p&gt;

&lt;p&gt;The core concept is what ByteDance calls a "SuperAgent harness." Rather than a single intelligent agent, deer-flow is &lt;a href="https://github.com/bytedance/deer-flow" rel="noopener noreferrer"&gt;an orchestration runtime&lt;/a&gt; that gives agents the infrastructure to actually get work done: a lead agent that decomposes complex tasks into parallelizable sub-tasks, spawning sub-agents with scoped contexts, running them concurrently, then synthesizing results into a coherent output. The framework handles tasks that "take minutes to hours."&lt;/p&gt;

&lt;p&gt;What makes this concrete is the execution environment. As &lt;a href="https://dev.to/arshtechpro/deerflow-20-what-it-is-how-it-works-and-why-developers-should-pay-attention-3ip3"&gt;Dev.to's technical breakdown&lt;/a&gt; put it directly: "The agent does not suggest a bash command. It runs it." Deer-flow provides agents with an isolated Docker container with filesystem access and a bash terminal — actual compute, not a sandbox emulation.&lt;/p&gt;

&lt;p&gt;Key architecture decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sub-agent parallelization&lt;/strong&gt;: Scoped contexts, concurrent execution, convergent synthesis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent memory&lt;/strong&gt;: Asynchronous debounced queue tracking user preferences and project state across sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills system&lt;/strong&gt;: Markdown-based workflow definitions (extensible without code changes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model agnosticism&lt;/strong&gt;: Works with GPT-4, Claude, DeepSeek, Kimi, Doubao-Seed, and Ollama&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The production deployment guidance is notably serious. The documentation specifies 8+ vCPU / 16GB RAM minimum for server deployment, Docker-based production and development modes, and explicit warnings about untrusted network exposure with IP allowlisting and VLAN isolation recommendations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The ByteDance factor:&lt;/strong&gt; &lt;a href="https://venturebeat.com/orchestration/what-is-deerflow-and-what-should-enterprises-know-about-this-new-local-ai" rel="noopener noreferrer"&gt;VentureBeat noted&lt;/a&gt; that "ByteDance provenance may trigger organizational review processes." Enterprise teams in regulated industries or US government-adjacent environments should route this through procurement before deploying. MIT-licensed, fully auditable codebase — but the organizational source still matters for some teams.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/bytedance/deer-flow" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fagentconn.com%2Fblog%2Fhn-deerflow-bytedance-superagent-62k.png" alt="DeerFlow: 62,800 GitHub stars, #1 trending Feb 2026, ByteDance SuperAgent harness" width="800" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Built on:&lt;/strong&gt; LangGraph + LangChain. If your team already uses LangGraph for orchestration, deer-flow's mental model will feel familiar.&lt;/p&gt;




&lt;h2&gt;
  
  
  evolver (EvoMap) — Genome Evolution Protocol
&lt;/h2&gt;

&lt;p&gt;At 5,700 stars, &lt;a href="https://github.com/EvoMap/evolver" rel="noopener noreferrer"&gt;EvoMap/evolver&lt;/a&gt; is the smallest of the three by star count but the most distinctive by architecture. It introduced the Genome Evolution Protocol (GEP) — a framework for treating prompt evolution as a structured, auditable process analogous to biological gene expression.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://evomap.ai/blog/gep-protocol-deep-dive" rel="noopener noreferrer"&gt;GEP deep dive&lt;/a&gt; explains the key insight: rather than letting agents evolve through raw trial-and-error, GEP solidifies successful behaviors into three reusable asset types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Genes&lt;/strong&gt;: Atomic capability units — validated code or prompt fragments for a single operation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capsules&lt;/strong&gt;: Successful task execution paths — complex problem solutions encoded as reusable workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Events&lt;/strong&gt;: Immutable evolution logs — every mutation (Innovation) or repair (Repair) recorded with full context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The operational logic is disciplined: the 70/30 rule allocates 70% of compute to stability (Repair mode) and 30% to capability expansion (Feature mode). When crashes or tool call failures are detected, evolver enters Repair Mode and follows explicit protocol gates before any mutation.&lt;/p&gt;

&lt;p&gt;Critically: &lt;strong&gt;evolver does not edit code directly.&lt;/strong&gt; It generates guided prompts for human review or integration with host runtimes. This limits scope — and also limits blast radius.&lt;/p&gt;

&lt;p&gt;The launch story is worth knowing: evolver hit the top of ClawHub within 10 minutes of release in February 2026, racking up 36,000 downloads in three days. It later became the center of a plagiarism controversy when EvoMap accused Hermes Agent (released March 2026) of copying evolver's self-evolution architecture — a 24-39 day window from evolver's open-source release to Hermes Agent's similar feature shipping.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/EvoMap/evolver" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fagentconn.com%2Fblog%2Fhn-evolver-evomap-gep-protocol.png" alt="EvoMap/evolver: GEP Genome Evolution Protocol — 5,700 stars, 36K ClawHub downloads" width="800" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Teams that need compliance-friendly audit trails for agent behavior changes, or deployments in regulated environments where agent mutations need to be explainable.&lt;/p&gt;




&lt;h2&gt;
  
  
  GenericAgent (lsdefine) — The Minimal Skill Tree
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/lsdefine/GenericAgent" rel="noopener noreferrer"&gt;GenericAgent&lt;/a&gt; makes its design philosophy explicit: "grows a skill tree from a 3,300-line seed, achieving full system control with 6x less token consumption." The Fudan University team built something unusually minimal — the entire framework is ~3K lines with a ~100-line agent loop.&lt;/p&gt;

&lt;p&gt;The architecture is built around five layers of memory (L0–L4):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;L0&lt;/strong&gt;: Meta-rules (agent identity and constraints)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L1&lt;/strong&gt;: Insights (generalized patterns from past tasks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L2&lt;/strong&gt;: Global facts (persistent world knowledge)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L3&lt;/strong&gt;: Task skills (crystallized execution paths from completed tasks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L4&lt;/strong&gt;: Session archives (full interaction logs, added April 2026)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When GenericAgent completes a task, it automatically crystallizes the execution path as a skill file. As &lt;a href="https://pyshine.com/GenericAgent-Self-Evolving-AI-Agent/" rel="noopener noreferrer"&gt;PyShine's walkthrough&lt;/a&gt; notes: "After a few weeks, an agent instance will have a skill tree no one else in the world has — all grown from 3K lines of seed code."&lt;/p&gt;

&lt;p&gt;The token efficiency claim is real and measurable. Where comparable agents require 200K–1M token context windows, GenericAgent operates under 30K by loading only relevant skills from memory rather than the full history. The "6x less" figure comes from this selective loading compared to agents that stuff entire conversation histories into context.&lt;/p&gt;

&lt;p&gt;Nine atomic tools cover the full system control surface: browser (with preserved login sessions), terminal, filesystem, keyboard/mouse input, screen vision, and mobile ADB. Multi-model: supports Claude, Gemini, Kimi, MiniMax.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/lsdefine/GenericAgent" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fagentconn.com%2Fblog%2Fhn-genericagent-skill-tree-4k.png" alt="GenericAgent: 4,600 stars, 6x token reduction, Fudan team self-evolving skill tree" width="800" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Cost-conscious teams running long-running autonomous agents where token efficiency directly maps to operational cost. Also the most approachable codebase of the three — 3,300 lines is something a team can actually audit in a week.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Security Reality No One Mentions
&lt;/h2&gt;

&lt;p&gt;All three frameworks share a category-level risk that &lt;a href="https://simonw.substack.com/p/the-lethal-trifecta-for-ai-agents" rel="noopener noreferrer"&gt;Simon Willison identified&lt;/a&gt; as "the lethal trifecta": if an agent combines (1) access to private data, (2) exposure to untrusted content, and (3) the ability to externally communicate, an attacker can trick it into exfiltrating private data to an external endpoint. Self-evolving agents make this attack surface significantly larger than standard API-call agents.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.gravitee.io/blog/state-of-ai-agent-security-2026-report-when-adoption-outpaces-control" rel="noopener noreferrer"&gt;2026 AI Agent Security Report&lt;/a&gt; puts it starkly: 88% of organizations confirmed or suspected security incidents involving AI agents in the last year. Only 24.4% have full visibility into which agents are communicating with each other. More than half run with no security oversight or logging.&lt;/p&gt;

&lt;p&gt;For self-evolving frameworks specifically, the risk compounds: if the framework modifies agent behavior over time (as all three do), security review at deployment isn't sufficient — you need ongoing behavioral monitoring.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.bvp.com/atlas/securing-ai-agents-the-defining-cybersecurity-challenge-of-2026" rel="noopener noreferrer"&gt;Bessemer Venture Partners&lt;/a&gt; frames the identity problem: "In a mature agentic ecosystem, swarms of agents may be instantiated to perform a single task and then decommissioned within minutes — traditional security architectures that rely on periodic scans will fail to detect these identities entirely."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical mitigation per framework:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;deer-flow&lt;/strong&gt;: Docker sandbox isolation is built-in; use it. Enable IP allowlisting and VLAN isolation as the docs recommend. Monitor sub-agent spawning rates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;evolver&lt;/strong&gt;: Use Review mode and validation steps. The audit trail via Events is the strongest governance artifact of the three.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GenericAgent&lt;/strong&gt;: Audit the skill tree periodically. Skills accumulate without a built-in approval gate — add one in production deployments.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Decision Matrix
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fagentconn.com%2Fblog%2Fself-evolving-ai-agents-evolver-genericagent-deerflow-comparison-2026-diagram-1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fagentconn.com%2Fblog%2Fself-evolving-ai-agents-evolver-genericagent-deerflow-comparison-2026-diagram-1.jpg" alt="Comparison table: deer-flow vs evolver vs GenericAgent — stars, architecture, security, production readiness" width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;deer-flow&lt;/th&gt;
&lt;th&gt;evolver&lt;/th&gt;
&lt;th&gt;GenericAgent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stars&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;62.8k&lt;/td&gt;
&lt;td&gt;5.7k&lt;/td&gt;
&lt;td&gt;4.6k&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Language&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Python + TypeScript&lt;/td&gt;
&lt;td&gt;JavaScript&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Self-evolution type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sub-agent + memory&lt;/td&gt;
&lt;td&gt;Prompt/gene evolution&lt;/td&gt;
&lt;td&gt;Skill tree accumulation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token efficiency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;6x vs. alternatives&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sandbox&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker (built-in)&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Audit trail&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LangSmith/Langfuse&lt;/td&gt;
&lt;td&gt;Built-in Events log&lt;/td&gt;
&lt;td&gt;Session archive (L4)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ByteDance provenance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Production-ready&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (with hardening)&lt;/td&gt;
&lt;td&gt;Yes (limited scope)&lt;/td&gt;
&lt;td&gt;Yes (with monitoring)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Choose deer-flow&lt;/strong&gt; when you're building long-horizon autonomous tasks — research pipelines, multi-step code generation, content workflows that run for hours. The Docker sandbox, sub-agent parallelization, and extensive deployment documentation make it the most enterprise-ready despite the ByteDance provenance consideration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose evolver&lt;/strong&gt; when compliance and audit trails are non-negotiable. The GEP protocol's structured mutation model is the only framework here that produces a legally defensible record of every agent behavior change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose GenericAgent&lt;/strong&gt; when token cost is the primary constraint, or when you want a framework small enough to audit completely. The 3,300-line codebase is readable by a small team in a week. The 6x token efficiency advantage is real and meaningful at production scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;None of the above&lt;/strong&gt; if you're building a customer-facing application where adversarial users could reach the agent with untrusted content. All three need additional input sanitization and communication controls before they're safe in that context.&lt;/p&gt;

&lt;p&gt;For context on related frameworks: the &lt;a href="https://agentconn.com/blog/nousresearch-hermes-agent-self-improving-framework-review" rel="noopener noreferrer"&gt;hermes-agent review&lt;/a&gt; covers NousResearch's self-improving framework (95.6K stars) which is the highest-starred in this category but follows a different architectural approach.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;deer-flow:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/bytedance/deer-flow
&lt;span class="nb"&gt;cd &lt;/span&gt;deer-flow &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Visit &lt;code&gt;localhost:3000&lt;/code&gt;. Works with any OpenAI-compatible API key.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;evolver:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @evomap/evolver
evolver init &lt;span class="nt"&gt;--mode&lt;/span&gt; review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Review mode prevents any mutation from applying without human confirmation — recommended for first deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GenericAgent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/lsdefine/GenericAgent
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python agent.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See &lt;code&gt;GETTING_STARTED.md&lt;/code&gt; in the repo — the Fudan team wrote unusually clear onboarding documentation.&lt;/p&gt;




&lt;p&gt;The category is real. Three frameworks at 62.8k, 5.7k, and 4.6k stars trending simultaneously isn't noise — it's the infrastructure layer of agentic AI arriving in production-deployable form. The question isn't whether to pay attention; it's which one fits your actual use case, and whether your team has thought through the security posture before the first deployment.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://arxiv.org/abs/2507.21046" rel="noopener noreferrer"&gt;comprehensive academic survey&lt;/a&gt; ends with an observation worth sitting with: "The challenge isn't making agents that learn — it's making agents whose learning is observable, bounded, and reversible." All three frameworks here have made progress on the first goal. The second and third are still largely up to the team deploying them.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://agentconn.com/blog/self-evolving-ai-agents-evolver-genericagent-deerflow-comparison-2026" rel="noopener noreferrer"&gt;AgentConn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>python</category>
      <category>security</category>
    </item>
    <item>
      <title>openai-agents-python: Build Multi-Agent AI Workflows (2026)</title>
      <dc:creator>Max Quimby</dc:creator>
      <pubDate>Mon, 20 Apr 2026 03:38:53 +0000</pubDate>
      <link>https://forem.com/max_quimby/openai-agents-python-build-multi-agent-ai-workflows-2026-45gk</link>
      <guid>https://forem.com/max_quimby/openai-agents-python-build-multi-agent-ai-workflows-2026-45gk</guid>
      <description>&lt;p&gt;OpenAI's &lt;a href="https://github.com/openai/openai-agents-python" rel="noopener noreferrer"&gt;openai-agents-python&lt;/a&gt; crossed 22,981 GitHub stars this week — gaining 751 in a single day and landing at #2 on GitHub's global trending list. That's not hype noise. It's developer validation. And it happened the same week OpenAI rolled out sandbox execution support for enterprise deployments, cementing this library's position as the most-starred agent framework on the platform.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📖 &lt;a href="https://computeleap.com/blog/openai-agents-python-tutorial-multi-agent-ai-workflows-2026" rel="noopener noreferrer"&gt;Read the full version with charts, code, and embedded sources on ComputeLeap →&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But star counts tell you nothing about whether something is worth learning. So this tutorial skips the marketing and goes straight to the code. By the end, you'll have a working multi-agent research pipeline you can actually run — and an honest assessment of when this SDK makes sense versus building the same workflow with Anthropic's Claude.&lt;/p&gt;

&lt;p&gt;Today's intelligence signals confirm what GitHub is showing: &lt;strong&gt;5 of the top 7 trending AI repos are explicitly multi-agent or self-evolving systems&lt;/strong&gt;. The infrastructure layer is materializing. If you're a developer building anything AI-adjacent in 2026, understanding how agent orchestration actually works — not in theory, but in production — is now a baseline skill.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=35nxORG1mtg" rel="noopener noreferrer"&gt;▶️ Watch: Agents SDK from OpenAI! Full Tutorial&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why openai-agents-python Is Having Its Moment
&lt;/h2&gt;

&lt;p&gt;The library is the official, production-ready successor to OpenAI's experimental &lt;a href="https://github.com/openai/swarm" rel="noopener noreferrer"&gt;Swarm&lt;/a&gt; library. Where Swarm was a research demo, &lt;code&gt;openai-agents-python&lt;/code&gt; ships the same multi-agent primitives in a framework that's designed for real deployments.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ℹ️ The SDK is provider-agnostic — it works with OpenAI's APIs and supports 100+ additional LLMs via LiteLLM and compatible adapters. So despite the OpenAI branding, you're not locked in at the model layer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Nine capabilities ship out of the box:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Agents&lt;/strong&gt; — LLMs configured with instructions, tools, guardrails, and handoffs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox Agents&lt;/strong&gt; — agents running inside isolated containers for extended tasks (&lt;a href="https://techcrunch.com/2026/04/15/openai-updates-its-agents-sdk-to-help-enterprises-build-safer-more-capable-agents/" rel="noopener noreferrer"&gt;TechCrunch, April 2026&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Delegation&lt;/strong&gt; — agents that function as tools, callable by other agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt; — function tools, MCP integrations, and hosted tools (file search, web search, code interpreter)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Guardrails&lt;/strong&gt; — input/output validation with blocking and tripwire modes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human In The Loop&lt;/strong&gt; — structured pause points for human review&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sessions&lt;/strong&gt; — automatic conversation history management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tracing&lt;/strong&gt; — built-in observability integrating with OpenAI's dashboard, Logfire, and OpenTelemetry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice&lt;/strong&gt; — support for &lt;code&gt;gpt-realtime-1.5&lt;/code&gt; voice agents&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Version v0.13 (the current release) added an any-LLM adapter, opt-in retry policies, MCP resource support, and session persistence — making it meaningfully more production-ready than it was at launch. The &lt;a href="https://softmaxdata.com/blog/definitive-guide-to-agentic-frameworks-in-2026-langgraph-crewai-ag2-openai-and-more/" rel="noopener noreferrer"&gt;Definitive Guide to Agentic Frameworks in 2026&lt;/a&gt; ranks it among the top 3 most actively developed frameworks alongside LangGraph and Microsoft's Agent Framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation and Setup
&lt;/h2&gt;

&lt;p&gt;Requirements: Python 3.10+, an OpenAI API key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai-agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For voice support:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"openai-agents[voice]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk-..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your first agent in under 10 lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the capital of France?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final_output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# → "The capital of France is Paris."
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the complete hello world. &lt;code&gt;Agent&lt;/code&gt; defines the LLM + instructions + tools. &lt;code&gt;Runner&lt;/code&gt; executes it. &lt;code&gt;run_sync&lt;/code&gt; blocks until the agent produces its final output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Concepts in 5 Minutes
&lt;/h2&gt;

&lt;p&gt;Before building anything non-trivial, you need to understand five primitives.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Agents
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;researcher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Researcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You research topics thoroughly.
    Always provide sources and key facts.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;model&lt;/code&gt; parameter defaults to &lt;code&gt;gpt-4o&lt;/code&gt; if omitted. You can swap in any OpenAI model, or any LiteLLM-compatible endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Function Tools
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;function_tool&lt;/span&gt;

&lt;span class="nd"&gt;@function_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the web for information on a topic.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Your search implementation here
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Results for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;researcher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Researcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use search_web to find information.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;@function_tool&lt;/code&gt; decorator auto-generates the JSON schema from your function signature and docstring. Pydantic validation runs on every call — no manual schema writing required.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Handoffs
&lt;/h3&gt;

&lt;p&gt;Handoffs let one agent transfer control entirely to another:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;writer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Writer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write clear, engaging content based on research provided.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;researcher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Researcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research the topic, then hand off to the Writer.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the researcher decides the user would be better served by the writer, it hands off and the writer takes over the conversation entirely. This is a one-way transfer — the researcher is done.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Agent as Tool
&lt;/h3&gt;

&lt;p&gt;The alternative pattern keeps one agent in charge:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;writer_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;draft_content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tool_description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Draft written content from a research summary.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;coordinator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Coordinator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Orchestrate research and writing. Use draft_content to get the writer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s output.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;writer_tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here the coordinator calls the writer as a function and receives its output — the coordinator never loses control of the conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Guardrails
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GuardrailFunctionOutput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_guardrail&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SafetyCheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;is_safe&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;
    &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="nd"&gt;@input_guardrail&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;safety_check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;malicious&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GuardrailFunctionOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;output_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SafetyCheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_safe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Flagged content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;tripwire_triggered&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GuardrailFunctionOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;output_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SafetyCheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_safe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;tripwire_triggered&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;safe_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SafeAgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Help users with their questions.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input_guardrails&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;safety_check&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;tripwire_triggered=True&lt;/code&gt;, the agent never executes — preventing token spend on inputs that would fail downstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Your First Multi-Agent Workflow
&lt;/h2&gt;

&lt;p&gt;Here's a complete, runnable research pipeline with three specialized agents. You can copy and run this directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;function_tool&lt;/span&gt;

&lt;span class="c1"&gt;# --- Tool definitions ---
&lt;/span&gt;
&lt;span class="nd"&gt;@function_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the web for information on a given query.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Replace with your actual search API (Tavily, SerpAPI, etc.)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Search results for &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: Top 5 results found.]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@function_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_draft&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Save a draft to disk.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Saved draft to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# --- Agent definitions ---
&lt;/span&gt;
&lt;span class="n"&gt;reviewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reviewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a critical editor. Review drafts for:
    - Accuracy and factual claims
    - Clear structure and flow
    - Specific, actionable improvements
    Provide a verdict: APPROVED or NEEDS_REVISION.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;writer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Writer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a clear, concise technical writer.
    Write well-structured content from research notes.
    When done, hand off to the Reviewer for quality check.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;save_draft&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;researcher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Researcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You research topics thoroughly using web_search.
    Gather at least 3 distinct facts or perspectives.
    Summarize your findings, then hand off to the Writer.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;handoffs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --- Run the pipeline ---
&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;🔍 Starting research pipeline for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;researcher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research this topic and produce a written summary: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;✅ Pipeline complete.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Final output:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final_output&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenAI&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s openai-agents-python SDK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a &lt;strong&gt;chain&lt;/strong&gt;: &lt;code&gt;Researcher → Writer → Reviewer&lt;/code&gt;. Each agent does its job and hands off. The &lt;code&gt;Runner&lt;/code&gt; handles the entire execution loop — including managing multiple turns if an agent needs to call tools before handing off.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 The &lt;a href="https://cookbook.openai.com/examples/agents_sdk/multi-agent-portfolio-collaboration/multi_agent_portfolio_collaboration" rel="noopener noreferrer"&gt;OpenAI Cookbook's multi-agent portfolio collaboration example&lt;/a&gt; is the best reference for production-style patterns — a coordinator calls data analyst, statistician, and report writer as tools and merges their outputs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For debugging, enable tracing to see every step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt;
&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enable_verbose_stdout_logging&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full trace — every LLM call, tool execution, and handoff — is viewable in the OpenAI Traces Dashboard. This is essential for debugging where a pipeline stalls in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handoffs vs. Agent-as-Tool: Which Pattern to Use
&lt;/h2&gt;

&lt;p&gt;This is the core architectural decision in multi-agent systems. The &lt;a href="https://openai.github.io/openai-agents-python/multi_agent/" rel="noopener noreferrer"&gt;official multi-agent docs&lt;/a&gt; define the distinction clearly:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Handoff&lt;/th&gt;
&lt;th&gt;Agent-as-Tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Control&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Specialist takes over&lt;/td&gt;
&lt;td&gt;Manager retains control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Conversation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Specialist responds directly&lt;/td&gt;
&lt;td&gt;Manager synthesizes output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Routing workflows&lt;/td&gt;
&lt;td&gt;Aggregation workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Customer service triage&lt;/td&gt;
&lt;td&gt;Report generation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Use handoffs&lt;/strong&gt; when the conversation is inherently routing — the user interacts with whichever specialist is most relevant, and you want that specialist to own the exchange.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use agent-as-tool&lt;/strong&gt; when a manager needs to collect results from multiple specialists and synthesize them. The portfolio collaboration example from OpenAI's cookbook demonstrates this: a coordinator calls a data analyst, statistician, and report writer as tools, then merges their outputs into a final deliverable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9v6f2wcg4t8ytih0i96l.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9v6f2wcg4t8ytih0i96l.jpg" alt="Side-by-side diagram comparing Handoff pattern (triage routes to specialist who owns conversation) vs Agent-as-Tool pattern (manager calls specialists and synthesizes output)" width="800" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.to/jangwook_kim_e31e7291ad98/build-your-first-multi-agent-system-with-openai-agents-sdk-step-by-step-python-tutorial-2026-2n79"&gt;Dev.to tutorial by Jangwook Kim&lt;/a&gt; demonstrates both patterns with a complete content production pipeline — worth reading alongside this tutorial for a different angle on the same concepts.&lt;/p&gt;

&lt;p&gt;The developer community has been active on this architectural question. A popular HN thread showed practitioners converging on the same conclusion:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=45654040" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3l83b3ri3fcveqkv7pn4.png" alt="HN thread: Show HN Multi-Agent AI with OpenAI Agents SDK — developers debating handoff vs agent-as-tool pattern for report generation workflows" width="800" height="711"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Guardrails That Actually Work in Production
&lt;/h2&gt;

&lt;p&gt;The guardrails system is more sophisticated than it first appears. Two distinct scopes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent-level guardrails&lt;/strong&gt; run before the agent processes its turn. Good for filtering malicious inputs, PII, or off-topic requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool-level guardrails&lt;/strong&gt; run on every tool invocation within an agent's execution. Use these when you need to validate what the agent is actually &lt;em&gt;doing&lt;/em&gt;, not just what it received.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;output_guardrail&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;

&lt;span class="nd"&gt;@output_guardrail&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;no_pii_in_output&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Ensure no PII leaks in the agent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s response.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\d{3}-\d{2}-\d{4}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GuardrailFunctionOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;output_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flagged&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SSN pattern detected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;tripwire_triggered&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GuardrailFunctionOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;output_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flagged&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;tripwire_triggered&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Per the &lt;a href="https://openai.github.io/openai-agents-python/guardrails/" rel="noopener noreferrer"&gt;guardrails docs&lt;/a&gt;: "Blocking execution runs and completes the guardrail before the agent starts. If the guardrail tripwire is triggered, the agent never executes, preventing token consumption and tool execution."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Latent Space's analysis found a &lt;strong&gt;60x higher security incident rate&lt;/strong&gt; for agent deployments compared to standard API calls. Guardrails are necessary but not sufficient — you also need robust authentication, access controls, and sandbox execution for agents that touch the filesystem or execute code. OpenAI's April 2026 SDK update added sandbox support via E2B, Modal, Cloudflare, Daytona, Runloop, Vercel, and Blaxel.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  State Management and Sessions
&lt;/h2&gt;

&lt;p&gt;Sessions are the SDK's answer to long-horizon tasks — multi-step workflows where an agent needs to remember context across multiple runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents.extensions.sessions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InMemorySessionStorage&lt;/span&gt;

&lt;span class="n"&gt;storage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InMemorySessionStorage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LongRunningAgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You help users with multi-step tasks. Remember context from previous messages.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# First interaction
&lt;/span&gt;&lt;span class="n"&gt;result1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Start a report on market trends in AI agent frameworks.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report-session-001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Second interaction — agent remembers the previous exchange
&lt;/span&gt;&lt;span class="n"&gt;result2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Now add a section on the OpenAI Agents SDK specifically.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report-session-001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_storage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, swap &lt;code&gt;InMemorySessionStorage&lt;/code&gt; for the Redis-backed session store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"openai-agents[redis]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This persists sessions across server restarts and horizontal scale — essential for production multi-step workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP Integration
&lt;/h2&gt;

&lt;p&gt;The SDK supports Model Context Protocol for connecting external tools and data sources. Version 0.0.7+ includes the &lt;code&gt;MCPServerStdio&lt;/code&gt; class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents.mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MCPServerStdio&lt;/span&gt;

&lt;span class="n"&gt;mcp_server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MCPServerStdio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@modelcontextprotocol/server-filesystem&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/tmp/workspace&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FileAgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You help with file operations.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mcp_servers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mcp_server&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://news.ycombinator.com/item?id=43485566" rel="noopener noreferrer"&gt;HN discussion on OpenAI's MCP support&lt;/a&gt; captured the developer community's mixed reaction: top criticism is that "MCP overcomplicates tool calling" versus the counterpoint that MCP enables runtime tool discovery — you can add new tools to an MCP server without redeploying your agent code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=43485566" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7pbie9chkiitsse170o.png" alt="HN thread: OpenAI adds MCP support to Agents SDK — 807 points, 267 comments debating complexity vs runtime tool discovery benefits" width="800" height="711"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For most projects, function tools are simpler and sufficient. Reach for MCP when you need to reuse an existing MCP server ecosystem or when runtime tool discovery is a genuine requirement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Considerations
&lt;/h2&gt;

&lt;p&gt;Production deployments bring additional complexity that tutorials rarely cover. Community experience on HN offers the honest take:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=44358969" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ckkt82eqx1j6b1dbzgd.png" alt="HN thread: Agentic AI Hands-On in Python — practitioners sharing production war stories about security incidents, guardrails, and sandbox requirements" width="800" height="711"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability first.&lt;/strong&gt; In multi-agent systems, a single user query can trigger multiple LLM calls, tool executions, and handoffs. Tracing captures all of this. Connect to Logfire or export OpenTelemetry spans to your existing stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token accounting.&lt;/strong&gt; With multi-agent chains, token costs multiply fast. Each handoff means a new context window with the full conversation history. Design your agent instructions to be minimal and your handoff payloads to carry only what the next agent needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallel execution.&lt;/strong&gt; For independent subtasks, use &lt;code&gt;asyncio.gather&lt;/code&gt; with multiple &lt;code&gt;Runner.run&lt;/code&gt; calls rather than sequential handoffs. The &lt;a href="https://softmaxdata.com/blog/definitive-guide-to-agentic-frameworks-in-2026-langgraph-crewai-ag2-openai-and-more/" rel="noopener noreferrer"&gt;definitive guide&lt;/a&gt; covers this pattern in depth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sandbox for code execution.&lt;/strong&gt; Any agent that can execute arbitrary code should run inside a sandbox. The April 2026 update made this straightforward — pick your sandbox provider from the supported list and pass it to the agent configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Assessment: OpenAI SDK vs. Anthropic Claude SDK
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://composio.dev/content/claude-agents-sdk-vs-openai-agents-sdk-vs-google-adk" rel="noopener noreferrer"&gt;Composio three-way comparison&lt;/a&gt; puts it well: "These represent two competing visions of agentic AI: OpenAI ships an opinionated, batteries-included SDK; Anthropic ships a model plus an open protocol."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose openai-agents-python when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your team is already on GPT models and wants minimal switching cost&lt;/li&gt;
&lt;li&gt;You want hosted tools (file_search, web_search, code_interpreter) without managing your own infrastructure&lt;/li&gt;
&lt;li&gt;You need rapid prototyping — hello world in under 10 lines&lt;/li&gt;
&lt;li&gt;Your workflow is routing-oriented (triage → specialist patterns)&lt;/li&gt;
&lt;li&gt;Cost matters for longer sessions: OpenAI bills only tokens; Managed Agents adds $0.08/hour runtime fee that adds up for sessions over 10 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Anthropic's Claude SDK when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're building multi-model architectures — Claude's SDK is built on MCP, an open standard&lt;/li&gt;
&lt;li&gt;You need native computer control — agents can read files, write code, and execute commands without additional configuration&lt;/li&gt;
&lt;li&gt;Model quality is your primary variable — Polymarket currently prices Anthropic at 92% for "best AI model end of April"&lt;/li&gt;
&lt;li&gt;Vendor lock-in at the protocol layer is a concern (MCP is open; OpenAI's hosted tools are proprietary)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Per &lt;a href="https://agentpatch.ai/blog/openai-agents-sdk-vs-claude-agent-sdk/" rel="noopener noreferrer"&gt;AgentPatch's cost comparison&lt;/a&gt;: for short sessions under 5 minutes, pricing difference is negligible. For long-horizon tasks running 10–30 minutes, OpenAI runs 20–30% cheaper for the same token count.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://enhancial.substack.com/p/choosing-the-right-ai-framework-a" rel="noopener noreferrer"&gt;Enhancial framework comparison&lt;/a&gt; adds a useful dimension: quick prototyping (OpenAI SDK, 2–3 weeks to production) → production-grade single agent (Claude SDK, 1–2 weeks) → complex stateful systems (LangGraph, 1–3 months). Match the tool to your complexity requirement.&lt;/p&gt;

&lt;p&gt;For deeper context on the model-layer tradeoffs, see our &lt;a href="https://computeleap.com/blog/anthropic-vs-openai-api-developer-platform-2026" rel="noopener noreferrer"&gt;Anthropic vs. OpenAI API comparison&lt;/a&gt; and our &lt;a href="https://computeleap.com/blog/claude-code-opus-47-creator-secrets-expert-tips" rel="noopener noreferrer"&gt;Claude Code Opus 4.7 creator tips&lt;/a&gt; for the Claude-native workflow patterns.&lt;/p&gt;

&lt;p&gt;For making agents production-durable (surviving crashes and scaling to parallel executions), the Temporal integration is worth examining:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=44736713" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozyx71asp1lxfqdhcr1u.png" alt="HN thread: Show HN OpenAI Agents SDK demos with Temporal — durable execution that survives process crashes, used by OpenAI for ChatGPT Images and Codex" width="800" height="622"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;pip install openai-agents&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Copy the three-agent pipeline above and run it with your API key&lt;/li&gt;
&lt;li&gt;Swap the &lt;code&gt;web_search&lt;/code&gt; stub for a real API (Tavily integrates cleanly)&lt;/li&gt;
&lt;li&gt;Enable tracing and review the execution trace in the OpenAI dashboard&lt;/li&gt;
&lt;li&gt;Add your first input guardrail before exposing to external inputs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The framework is genuinely good. The primitives are small, the documentation is clear, and the handoff pattern makes complex routing workflows dramatically easier than building them from scratch. 22,981 developers found their way here this week — the SDK earned those stars by solving a real problem with clean abstractions. Build something with it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://computeleap.com/blog/openai-agents-python-tutorial-multi-agent-ai-workflows-2026" rel="noopener noreferrer"&gt;ComputeLeap&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>openai</category>
      <category>python</category>
      <category>aiagents</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
