<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Stephen Trembley </title>
    <description>The latest articles on Forem by Stephen Trembley  (@sturnaai).</description>
    <link>https://forem.com/sturnaai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3891384%2Faba05fc7-caa9-4d21-86e1-f3e59420271f.png</url>
      <title>Forem: Stephen Trembley </title>
      <link>https://forem.com/sturnaai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sturnaai"/>
    <language>en</language>
    <item>
      <title>Why Competitive Agent Routing Beats Static Orchestration</title>
      <dc:creator>Stephen Trembley </dc:creator>
      <pubDate>Sat, 25 Apr 2026 19:47:10 +0000</pubDate>
      <link>https://forem.com/sturnaai/why-competitive-agent-routing-beats-static-orchestration-3lnj</link>
      <guid>https://forem.com/sturnaai/why-competitive-agent-routing-beats-static-orchestration-3lnj</guid>
      <description>&lt;p&gt;&lt;em&gt;And why your router is about to become your bottleneck&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You're a developer. You've built something that works—a system with multiple agents, each specialized, each good at one job. Your router picks which agent handles which request. It works.&lt;/p&gt;

&lt;p&gt;Then you scale.&lt;/p&gt;

&lt;p&gt;At agent #5, your router is a simple if/else chain. Ugly, but fine.&lt;br&gt;
At agent #30, it's a switch statement. Maintainable.&lt;br&gt;
At agent #100, you're rewriting it every sprint. New agent added? Update the router. Agent retired? Update the router. New domain? Update the router. Someone on your team eventually asks: "What if we just... didn't do this?"&lt;/p&gt;

&lt;p&gt;That's the problem with static routing. Your router is hardcoded logic that lives outside your agents. Every agent you add is a new edge case to handle. Every business rule shift means touching the router. At scale (we're talking 200+ agents across 50+ domains), this doesn't just become annoying—it becomes the systemic bottleneck that keeps you from shipping.&lt;/p&gt;

&lt;p&gt;This is the story of static orchestration platforms like LangGraph and CrewAI. They're powerful. They're flexible. But they're also fundamentally static—you define the routing logic upfront, then ship it. Changing your routing strategy means code changes, testing, and deploys.&lt;/p&gt;

&lt;p&gt;There's a different way. It's called competitive agent routing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Static Routing (Why it breaks at scale)
&lt;/h2&gt;

&lt;p&gt;Let's say you have a Slack integration handling customer requests. You have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 agents for support issues&lt;/li&gt;
&lt;li&gt;2 agents for billing questions&lt;/li&gt;
&lt;li&gt;4 agents for technical debugging&lt;/li&gt;
&lt;li&gt;2 agents for account recovery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's 11 agents. Your router probably has 11 decision paths. When your support team escalates a "complex billing bug involving account access," which agent should handle it? Support or Billing or Technical? All three could claim expertise. Your router has to &lt;em&gt;guess&lt;/em&gt;, and guesses fail. You fix it manually, push a new deploy, and move on.&lt;/p&gt;

&lt;p&gt;Now scale to 201 agents across 59 domains.&lt;/p&gt;

&lt;p&gt;Your router doesn't scale. Worse, your router is now the system's single point of failure. A routing error affects every single request. A routing change affects the entire system.&lt;/p&gt;

&lt;p&gt;More fundamentally: &lt;strong&gt;Static routing assumes you know the right decision at deploy time.&lt;/strong&gt; But what if your best support agent goes on vacation? What if a new agent joins and is exceptional? What if seasonal business changes mean billing agents should handle more volume? Static routing can't adapt—you have to redeploy.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Competitive Routing Works
&lt;/h2&gt;

&lt;p&gt;Competitive routing flips the model.&lt;/p&gt;

&lt;p&gt;Instead of a centralized router making decisions, every agent independently evaluates whether it should handle a request. The agent that's most confident—and can handle it fastest/cheapest—wins.&lt;/p&gt;

&lt;p&gt;Here's the flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Intent arrives&lt;/strong&gt; — A customer request or system event&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broadcast to all agents&lt;/strong&gt; — "Can you handle this? What's your confidence? How long will it take?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents bid&lt;/strong&gt; — Each agent responds with a confidence score (0-100%) and predicted execution cost/time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic ranking&lt;/strong&gt; — Proposals ranked by confidence and cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quorum selection&lt;/strong&gt; — Top agent executes; backups standby in case of failure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute &amp;amp; log&lt;/strong&gt; — Result logged with metadata for analytics and future learning&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No centralized router. No hardcoded rules. Just agents self-selecting based on their actual capabilities in real-time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Numbers
&lt;/h2&gt;

&lt;p&gt;At Sturna, we built this. Here's what it looks like at scale:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;201 agents&lt;/strong&gt; across 59 different domains (Shopify, billing, content moderation, customer support, technical debugging, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2,965 proposals&lt;/strong&gt; generated per day—201 agents, each bidding on their confidence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1% selection rate&lt;/strong&gt;—only the highest-confidence agent actually executes (the other 200 standby as fallbacks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;31.7 seconds average resolution time&lt;/strong&gt;—competitive routing determines not just &lt;em&gt;who&lt;/em&gt; handles the request, but &lt;em&gt;which combination&lt;/em&gt; of agents or approach succeeds fastest&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0 hardcoded routing rules&lt;/strong&gt;—new agents onboard automatically; they bid alongside existing agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trick: you don't deploy new routes when you add new agents. The agents add themselves to the system. Existing agents compete fairly. Your system adapts in real-time.&lt;/p&gt;

&lt;p&gt;If you're running standard orchestration (CrewAI's supervisor, LangGraph's routing), you're deploying a new routing strategy. If you're running competitive routing, you're just adding another participant to an existing competition.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for Your Scale
&lt;/h2&gt;

&lt;p&gt;Three reasons this becomes critical at scale:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Adaptability without deploys&lt;/strong&gt; — Your system handles new agents, domain shifts, and seasonal load changes without code changes. On a Monday, maybe your support agents are overloaded; competitive routing automatically routes more work to technical agents. Tuesday, rebalance. No deploys required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Fault tolerance baked in&lt;/strong&gt; — If your top-ranked agent fails, your system doesn't fail. You have 199 other proposals already ranked. Grab the second-best. Competitive routing is inherently redundant—every request has fallbacks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Cost optimization&lt;/strong&gt; — Agents can bid based on their actual running cost. A cheaper agent with 85% confidence might beat a more expensive agent with 92% confidence. Your system is making trade-off decisions automatically. Over thousands of requests, this scales into real savings.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Catch
&lt;/h2&gt;

&lt;p&gt;Competitive routing has one hard requirement: &lt;strong&gt;agents need enough context and confidence to bid accurately.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A dumb agent that always says "90% confident, I'll handle this in 5 seconds" will lose to an intelligent agent that says "23% confident; I'd need to fetch 3 external APIs." Competitive routing surfaces bad agents naturally—they get ranked lower and lose the competition. But you do need agents that are smart enough to know what they &lt;em&gt;don't&lt;/em&gt; know.&lt;/p&gt;

&lt;p&gt;That's not a limitation of the pattern—it's the feature. You're forcing every agent to self-assess. Static routers can hide incompetence. Competitive routing exposes it immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Competitive Routing Wins
&lt;/h2&gt;

&lt;p&gt;This pattern wins hardest when you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multiple agents solving overlapping problems&lt;/strong&gt; — Support + Account Recovery both might handle a refund request. Let them compete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Uncertainty about the best path&lt;/strong&gt; — You genuinely don't know if Technical Debugging or Product Support should own this issue. Competitive routing figures it out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling beyond 30-40 agents&lt;/strong&gt; — Static routing becomes genuinely unmaintainable. Competitive routing scales linearly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-volume, low-latency requirements&lt;/strong&gt; — Every millisecond matters. Competitive routing lets fast agents win over slow ones, even if both have high confidence.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;We built Sturna on this pattern. 201 agents, 59 domains, zero hardcoded routing rules. You can see it in action:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://octomind-9fce.polsia.app" rel="noopener noreferrer"&gt;Try Sturna&lt;/a&gt;&lt;/strong&gt; — Broadcast an intent, watch 161 agents bid, see the best one execute.&lt;/p&gt;

&lt;p&gt;The interface is intentionally simple: type what you need, hit enter, watch your agents compete. It's a 15-second mental model, but the implications are enormous once your system scales.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Static routing gets you from 0 to ~30 agents. After that, it becomes a bottleneck—not just operationally (new deploys per new agent) but strategically (your routing logic becomes a central point of failure and fragility).&lt;/p&gt;

&lt;p&gt;Competitive routing is the pattern that scales. It's not magic. It's just agents smart enough to know when they're the right tool, confident enough to bid, and humble enough to lose when they're not.&lt;/p&gt;

&lt;p&gt;If you're building multi-agent systems and thinking about scale, competitive routing should be on your roadmap. And if you've already built static routing, it might be time to revisit.&lt;/p&gt;

</description>
      <category>programming</category>
    </item>
    <item>
      <title>How We Built a Self-Healing Agent Marketplace with 201 Competing AI Agents</title>
      <dc:creator>Stephen Trembley </dc:creator>
      <pubDate>Tue, 21 Apr 2026 22:06:38 +0000</pubDate>
      <link>https://forem.com/sturnaai/how-we-built-a-self-healing-agent-marketplace-with-201-competing-ai-agents-43kf</link>
      <guid>https://forem.com/sturnaai/how-we-built-a-self-healing-agent-marketplace-with-201-competing-ai-agents-43kf</guid>
      <description>&lt;p&gt;Most agent frameworks assume you know the best agent for the job before the job starts. You pick a model, wire a DAG, and hope it holds.&lt;/p&gt;

&lt;p&gt;We didn't know. So we made 201 agents compete for every task — and let outcomes decide.&lt;/p&gt;

&lt;p&gt;This is the architecture behind &lt;a href="https://sturna.ai" rel="noopener noreferrer"&gt;Sturna.ai&lt;/a&gt;, and why we call it the octopus brain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Static DAGs
&lt;/h2&gt;

&lt;p&gt;LangGraph, CrewAI, AutoGen — they're all variations of the same idea: you compose agents into a fixed graph. Agent A calls Agent B which calls Agent C. The flow is known at design time.&lt;/p&gt;

&lt;p&gt;That works until it doesn't.&lt;/p&gt;

&lt;p&gt;In production, task diversity is brutal. A single "analyze my competitors" intent might need a web scraper, a summarizer, a data formatter, and a report writer — or it might need completely different agents depending on which competitors, which market, which output format. Static graphs require you to anticipate all of this upfront. You can't.&lt;/p&gt;

&lt;p&gt;The deeper problem: when a node fails, the whole DAG fails. There's no self-healing. There's no "try something else." You get an error and you restart.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Octopus Brain Model
&lt;/h2&gt;

&lt;p&gt;An octopus has a central brain but its arms have their own neural clusters — each arm can act semi-independently, process information locally, and adapt without waiting for central coordination.&lt;/p&gt;

&lt;p&gt;We built Sturna with the same principle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Central coordinator&lt;/strong&gt; receives an intent and broadcasts it to all capable agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;201 specialized agents&lt;/strong&gt; each evaluate the task independently and submit proposals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Competitive routing&lt;/strong&gt; selects the best proposal based on past performance, confidence scores, and task type&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution layer&lt;/strong&gt; runs the winning agent — and if it fails, automatically routes to the next best proposal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No fixed DAG. No predetermined path. The route emerges from competition.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Self-Healing" Actually Means
&lt;/h2&gt;

&lt;p&gt;When people say "self-healing," they usually mean retry logic. Retry the same thing 3 times, then give up.&lt;/p&gt;

&lt;p&gt;That's not healing. That's hoping.&lt;/p&gt;

&lt;p&gt;Sturna's self-healing is architectural:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Every task has N competing proposals, ranked by predicted success&lt;/li&gt;
&lt;li&gt;If agent #1 fails, the system doesn't restart — it promotes agent #2&lt;/li&gt;
&lt;li&gt;Agent #2 runs with full context of what agent #1 attempted&lt;/li&gt;
&lt;li&gt;Failure data feeds back into routing scores, making future routing smarter&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agents aren't just competing for the first run. They're competing across every run, accumulating performance history that shapes every future routing decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers After 6 Months in Production
&lt;/h2&gt;

&lt;p&gt;After running this system across thousands of real tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;201 active agents&lt;/strong&gt; across 14 capability categories&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;86%+ first-attempt success rate&lt;/strong&gt; (vs ~60% with our original static routing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;45-second median time-to-value&lt;/strong&gt; from intent to delivered result&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-healing triggered on ~14% of tasks&lt;/strong&gt; — those tasks still complete, they just take a second pass&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 86% number is the one I'm most proud of. That's not accuracy on benchmarks — that's real tasks from real users completing successfully on the first agent attempt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Competitive Routing vs Static DAGs: The Real Tradeoff
&lt;/h2&gt;

&lt;p&gt;I want to be honest about what you give up with competitive routing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Static DAG&lt;/th&gt;
&lt;th&gt;Competitive Routing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Predictability&lt;/td&gt;
&lt;td&gt;High — same path every time&lt;/td&gt;
&lt;td&gt;Lower — path varies by agent performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debuggability&lt;/td&gt;
&lt;td&gt;Easy — trace the graph&lt;/td&gt;
&lt;td&gt;Harder — need proposal replay logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency (simple tasks)&lt;/td&gt;
&lt;td&gt;Lower&lt;/td&gt;
&lt;td&gt;Higher — broadcast overhead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency (complex tasks)&lt;/td&gt;
&lt;td&gt;Higher — no fallback path&lt;/td&gt;
&lt;td&gt;Lower — parallel evaluation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failure recovery&lt;/td&gt;
&lt;td&gt;Manual — fix the DAG&lt;/td&gt;
&lt;td&gt;Automatic — next proposal promoted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Improvement over time&lt;/td&gt;
&lt;td&gt;Manual — you retune&lt;/td&gt;
&lt;td&gt;Automatic — routing learns&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For simple, well-scoped tasks you run thousands of times, static DAGs win on predictability. For diverse, open-ended tasks where failure matters, competitive routing wins on resilience.&lt;/p&gt;

&lt;p&gt;We built Sturna for the second category.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Agents Submit Proposals
&lt;/h2&gt;

&lt;p&gt;Each agent in Sturna exposes a &lt;code&gt;canHandle(intent)&lt;/code&gt; method that returns a confidence score (0-1) and an execution plan. When a task comes in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Simplified — real implementation has more context&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;AgentProposal&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;estimatedDuration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;executionPlan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;requiredCapabilities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Coordinator broadcasts and collects&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;proposals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Rank by: confidence × historical success rate × recency&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;rankProposals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;proposals&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;agentHistory&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ranking function is the core IP. Confidence alone isn't enough — an agent can be overconfident on task types it's bad at. We weight heavily by actual historical success rate, with recency bias (recent performance matters more than old performance).&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Got Wrong First
&lt;/h2&gt;

&lt;p&gt;Two things killed our first two versions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version 1: Too much competition.&lt;/strong&gt; Broadcasting to all 201 agents created ~400ms of overhead even before execution started. We added capability tagging — agents declare what they can handle, and broadcast only goes to capable agents. Overhead dropped to ~30ms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version 2: No proposal replay.&lt;/strong&gt; When an agent failed, the next agent started completely fresh. Users saw inconsistent results. We built a context handoff layer — the winning backup agent receives what the failed agent attempted, and can continue rather than restart.&lt;/p&gt;

&lt;p&gt;The context handoff was 3 weeks of work and cut re-execution time in half.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Goes
&lt;/h2&gt;

&lt;p&gt;The 201-agent number isn't a ceiling. Every new capability we add is a new agent. The routing system gets better the more agents compete — more data, more diversity, more paths to success.&lt;/p&gt;

&lt;p&gt;We're currently working on agent coalitions: groups of agents that propose to handle a task collaboratively, with shared execution context. The octopus brain, but with arms that can coordinate.&lt;/p&gt;

&lt;p&gt;If you're building agent infrastructure and want to compare notes, we're at &lt;a href="https://sturna.ai" rel="noopener noreferrer"&gt;sturna.ai&lt;/a&gt;. The system is live and handling real production traffic — we'd rather learn from builders than pitch in abstractions.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post covers the architecture as it exists today. The numbers are from our internal dashboards as of April 2026.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>webdev</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
