<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Msatfi</title>
    <description>The latest articles on Forem by Msatfi (@msatfi89).</description>
    <link>https://forem.com/msatfi89</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3768570%2F4cd6c453-afe3-4499-b243-378b510cef64.png</url>
      <title>Forem: Msatfi</title>
      <link>https://forem.com/msatfi89</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/msatfi89"/>
    <language>en</language>
    <item>
      <title>"I Ran 14 Agent Failure Scenarios Through a Guard Layer (With Cost Data From Every Run)"</title>
      <dc:creator>Msatfi</dc:creator>
      <pubDate>Fri, 13 Mar 2026 07:56:06 +0000</pubDate>
      <link>https://forem.com/msatfi89/i-ran-14-agent-failure-scenarios-through-a-guard-layer-with-cost-data-from-every-run-45d2</link>
      <guid>https://forem.com/msatfi89/i-ran-14-agent-failure-scenarios-through-a-guard-layer-with-cost-data-from-every-run-45d2</guid>
      <description>&lt;p&gt;How do you know if your agent is looping or actually working?&lt;/p&gt;

&lt;p&gt;I've been stress-testing AI agents against known failure modes (tool loops, duplicate side-effects, retry storms) and I built a middleware layer to catch them. Then I measured what it actually caught across 14 scenarios and 25 real-model runs. Here's the data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Agents generate activity. Tool calls, search results, reformulated queries. It looks like work. But after enough runs, I kept finding the same three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent searches "refund policy", then "refund policy EU", then "refund policy EU Germany 2024". Each query is different. Same results every time.&lt;/li&gt;
&lt;li&gt;Agent issues a refund, gets a timeout, retries. Customer refunded twice.&lt;/li&gt;
&lt;li&gt;Agent A asks Agent B for help. Agent B asks Agent A for clarification. Back and forth forever.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;max_steps&lt;/code&gt; doesn't help. It can't tell productive calls from loops. Set it too low, you kill good workflows. Too high, you burn money.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the guard does
&lt;/h2&gt;

&lt;p&gt;AuraGuard sits between the agent and its tools. Every tool call goes through it. It checks the call signature against rolling history and returns a decision: ALLOW, CACHE, BLOCK, REWRITE, ESCALATE, or FINALIZE.&lt;/p&gt;

&lt;p&gt;No LLM calls. Deterministic heuristics only. HMAC signatures, token-set overlap, counters, sequence matching.&lt;/p&gt;

&lt;p&gt;8 primitives run in sequence on every call:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Identical repeat detection&lt;/li&gt;
&lt;li&gt;Argument jitter detection (token-set overlap)&lt;/li&gt;
&lt;li&gt;Error retry circuit breaker&lt;/li&gt;
&lt;li&gt;Side-effect idempotency ledger&lt;/li&gt;
&lt;li&gt;Stall/no-state-change detection&lt;/li&gt;
&lt;li&gt;Cost budget enforcement&lt;/li&gt;
&lt;li&gt;Per-tool policy layer&lt;/li&gt;
&lt;li&gt;Multi-tool sequence loop detection&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Zero dependencies. Stdlib Python only.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the benchmark output looks like
&lt;/h2&gt;

&lt;p&gt;14 synthetic scenarios. Each one replays a specific failure pattern. No LLM involved. This measures detection accuracy, not model behavior.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aura-guard bench &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scenario                             No Guard   Aura Guard    Saved
────────────────────────────────────────────────────────────────────
KB Query Jitter Loop                    $0.32        $0.24      25%
Double Refund Attempt                   $0.16        $0.08      50%
Error Retry Spiral                      $0.40        $0.26      35%
CRM Lookup Cascade                      $0.36        $0.24      33%
Stall + Apology Spiral                  $0.04        $0.06     -50%
Mixed Degradation                       $0.40        $0.28      30%
RAG Retrieval Loop                      $0.40        $0.28      30%
Ticket Lookup Cascade                   $0.32        $0.20      38%
Side-Effect Storm                       $0.28        $0.16      43%
Budget Overrun                          $0.80        $0.76       5%
Healthy Workflow (FP check)             $0.20        $0.20       0%
Ping-Pong Delegation Loop               $0.40        $0.30      25%
Circular 3-Agent Delegation             $0.48        $0.40      17%
Mixed Normal + Sequence Loop            $0.44        $0.38      14%
────────────────────────────────────────────────────────────────────
TOTAL                                   $5.00        $3.94      21%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I learned from the data
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;"Stall + Apology Spiral" costs more with the guard.&lt;/strong&gt; +$0.02 overhead. The guard adds an intervention turn, then escalates. Without the guard the agent loops forever. Small cost increase for termination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Healthy Workflow" shows 0% savings.&lt;/strong&gt; This scenario exists to verify zero false positives. Five normal tool calls, no loops. The guard allows all of them. If this ever goes above 0%, the thresholds are wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Budget Overrun only saves 5%.&lt;/strong&gt; The guard escalates on the 19th of 20 calls. Most of the budget is already spent. Budget enforcement catches the overrun. It doesn't prevent the spend leading up to it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Side-effect scenarios prevent business damage, not token waste.&lt;/strong&gt; "Double Refund Attempt" blocks the duplicate. "Side-Effect Storm" blocks 3 of 6 mutations. These are prevented duplicate charges, not cost optimizations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-model A/B test
&lt;/h2&gt;

&lt;p&gt;5 scenarios against Claude Sonnet. 5 runs per variant. Real API calls. Tools rigged to trigger failures.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;No guard&lt;/th&gt;
&lt;th&gt;With guard&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Jitter loop&lt;/td&gt;
&lt;td&gt;$0.28&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;48% saved&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Double refund&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;$0.15&lt;/td&gt;
&lt;td&gt;Duplicate prevented at +$0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error retry spiral&lt;/td&gt;
&lt;td&gt;$0.13&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;29% saved&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Smart reformulation&lt;/td&gt;
&lt;td&gt;$0.86&lt;/td&gt;
&lt;td&gt;$0.15&lt;/td&gt;
&lt;td&gt;83% saved&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Combined flagship&lt;/td&gt;
&lt;td&gt;$0.35&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;59% saved&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The "smart reformulation" one caught me off guard. The agent reformulated queries with different word order, synonyms, added qualifiers. String matching wouldn't catch it. Token-set overlap was above 60%. 83% cost reduction.&lt;/p&gt;

&lt;p&gt;64 interventions across 25 runs. Zero false positives in manual review. JSON report committed in the repo.&lt;/p&gt;

&lt;p&gt;Caveats: tools were rigged. Controlled test, not production replay.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the failure modes look like in aggregate
&lt;/h2&gt;

&lt;p&gt;After hundreds of runs, three categories:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exploration loops. 60% of interventions.&lt;/strong&gt; The agent explores a search space with diminishing returns. Each query is slightly different. Each result is slightly different. The agent thinks it's making progress. It's not. Jitter detection and per-tool caps catch these.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retry spirals. 25% of interventions.&lt;/strong&gt; Tool fails. Agent retries. Fails again. Retries with modifications. The tool is down. No modification will fix it. Circuit breaker catches these.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Delegation loops. 15% of interventions.&lt;/strong&gt; Multi-agent only. A asks B. B asks A. Repeat. Sequence detection catches these after the pattern repeats 3 times.&lt;/p&gt;

&lt;p&gt;Category 1 is the most dangerous because it looks like productivity. The agent is "working." The logs show diverse calls. The summaries report progress. Only the token counter tells the truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aura-guard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura_guard&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentGuard&lt;/span&gt;

&lt;span class="n"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentGuard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;secret_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;side_effect_tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cancel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;max_cost_per_run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_kb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_kb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refund policy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the benchmark against your own failure patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aura-guard bench &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MCP support (Claude Desktop, Cursor, any MCP client):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aura-guard[mcp]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Who this is for
&lt;/h2&gt;

&lt;p&gt;Anyone running an AI agent that calls tools. Especially tools with side effects (payments, emails, cancellations) where a duplicate call causes real damage, not just wasted tokens.&lt;/p&gt;

&lt;p&gt;You don't need a specific framework. The guard wraps any Python callable. OpenAI, LangChain, and MCP adapters are included if you want them.&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://github.com/auraguardhq/aura-guard" rel="noopener noreferrer"&gt;github.com/auraguardhq/aura-guard&lt;/a&gt;. 75 tests, zero dependencies.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>autonomousagents</category>
      <category>devops</category>
      <category>python</category>
    </item>
    <item>
      <title>How a 3-Line Middleware Would Have Stopped the Replit Database Disaster</title>
      <dc:creator>Msatfi</dc:creator>
      <pubDate>Thu, 12 Feb 2026 11:22:30 +0000</pubDate>
      <link>https://forem.com/msatfi89/how-a-3-line-middleware-would-have-stopped-the-replit-database-disaster-2okd</link>
      <guid>https://forem.com/msatfi89/how-a-3-line-middleware-would-have-stopped-the-replit-database-disaster-2okd</guid>
      <description>&lt;p&gt;In July 2025, &lt;a href="https://fortune.com/2025/07/23/ai-coding-tool-replit-wiped-database-called-it-a-catastrophic-failure/" rel="noopener noreferrer"&gt;Replit's AI coding agent deleted a live production database&lt;/a&gt;. 1,206 executive records and 1,196 companies, gone. During an active code freeze. The agent admitted it "panicked," ignored eleven ALL-CAPS instructions not to make changes, fabricated 4,000 fake records to cover up the damage, and then &lt;a href="https://www.theregister.com/2025/07/21/replit_saastr_vibe_coding_incident/" rel="noopener noreferrer"&gt;lied about whether rollback was possible&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Three months later, &lt;a href="https://pub.towardsai.net/we-spent-47-000-running-ai-agents-in-production-heres-what-nobody-tells-you-about-a2a-and-mcp-5f845848de33" rel="noopener noreferrer"&gt;a multi-agent research system ran an infinite loop for 11 days straight&lt;/a&gt;. Two agents got stuck in a recursive conversation while the team slept. The bill: $47,000.&lt;/p&gt;

&lt;p&gt;These aren't fringe cases. They're the predictable result of giving autonomous agents unrestricted tool access with no runtime governor.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/auraguardhq/aura-guard" rel="noopener noreferrer"&gt;Aura Guard&lt;/a&gt; because I kept running into the same failure modes in my own agent work, and I wanted to understand exactly which primitives would have caught each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually went wrong at Replit
&lt;/h2&gt;

&lt;p&gt;Reading through the full timeline, the Replit incident wasn't a single failure. It was a cascade of at least four distinct failure modes, each of which is independently preventable:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure 1: Repeated destructive calls with no circuit breaker.&lt;/strong&gt;&lt;br&gt;
The agent executed DROP TABLE and DELETE commands on production tables. When the first destructive call succeeded (from the agent's perspective), there was no mechanism to flag that a high-impact tool had already fired and should not fire again without human approval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure 2: No side-effect deduplication.&lt;/strong&gt;&lt;br&gt;
The agent didn't just delete once. It ran multiple destructive operations across tables. Each one was treated as a fresh, independent action. There was no ledger tracking "this agent run has already executed a destructive operation."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure 3: No stall detection after the damage.&lt;/strong&gt;&lt;br&gt;
After deleting the database, the agent entered a loop of generating fake data and misleading status messages. It was producing output that &lt;em&gt;looked&lt;/em&gt; like progress but was actually the same pattern repeating: generate fake records, claim success, generate more fake records. No system flagged that the agent's outputs had stopped making forward progress.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure 4: No cost or action budget.&lt;/strong&gt;&lt;br&gt;
The agent ran unconstrained. There was no per-run limit on how many tool calls it could make, how much it could spend, or how many side effects it could trigger. The damage scaled linearly with time until a human noticed.&lt;/p&gt;
&lt;h2&gt;
  
  
  Mapping each failure to a specific enforcement primitive
&lt;/h2&gt;

&lt;p&gt;Here's the part most "what went wrong" articles skip: what would the fix actually look like in code?&lt;/p&gt;

&lt;p&gt;Aura Guard implements seven enforcement primitives. Four of them map directly to the Replit failures:&lt;/p&gt;
&lt;h3&gt;
  
  
  Failure 1 → Primitive 3: Error circuit breaker + Primitive 7: Tool policy
&lt;/h3&gt;

&lt;p&gt;If destructive database operations were tagged in the tool policy as requiring human approval:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura_guard&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentGuard&lt;/span&gt;

&lt;span class="n"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentGuard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tool_policies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;execute_sql&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;access&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human_approval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;critical&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The guard returns &lt;code&gt;ESCALATE&lt;/code&gt; instead of &lt;code&gt;ALLOW&lt;/code&gt;. The agent never gets to execute the DROP TABLE. The orchestrator routes it to a human.&lt;/p&gt;

&lt;p&gt;Even without pre-configured policies, the circuit breaker would have caught this: after the first destructive call returned an error or unexpected result, the tool gets quarantined for the remainder of the run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure 2 → Primitive 4: Side-effect gating + idempotency ledger
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentGuard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;side_effect_tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;execute_sql&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delete_records&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;drop_table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;max_cost_per_run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Side-effect tools are tracked in an idempotency ledger. By default, each side-effect tool can execute &lt;em&gt;once per run&lt;/em&gt;. The second DELETE call returns &lt;code&gt;BLOCK&lt;/code&gt;. The guard has already recorded that this tool fired a side effect with these arguments in this run. The agent gets a cached result instead of a live execution.&lt;/p&gt;

&lt;p&gt;The idempotency key is deterministic: &lt;code&gt;HMAC(secret, "idem:{ticket}:{tool}:{args}")&lt;/code&gt;. Same tool + same args + same run = same key = no re-execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure 3 → Primitive 5: Stall detection
&lt;/h3&gt;

&lt;p&gt;After the deletion, the agent looped through generating fake data and producing apologetic, repetitive text. Aura Guard's stall detector uses two independent signals:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Token overlap&lt;/strong&gt;: if the assistant's output is 92% or more similar to its previous output (measured on HMAC'd token signatures), it's flagged as stalling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pattern scoring&lt;/strong&gt;: regex detectors catch common stall phrases like "I apologize," "let me try again," "I understand your concern," and score them. Above 0.6, it's flagged.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both signals must fire. After the configured number of stall turns (default: 4), the guard forces a deterministic outcome, either a structured finalization or an escalation to a human. The agent cannot continue looping.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure 4 → Primitive 6: Cost budget
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentGuard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;max_cost_per_run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# hard stop at 50 cents
&lt;/span&gt;    &lt;span class="n"&gt;max_calls_per_tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# no tool can be called more than 3 times
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The budget check runs before every tool call. When projected cost exceeds the limit, the guard returns &lt;code&gt;ESCALATE&lt;/code&gt; with a cost report. The $47,000 11-day loop? It hits the budget cap on minute one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;The full integration is three method calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aura_guard&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentGuard&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PolicyAction&lt;/span&gt;

&lt;span class="n"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentGuard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;side_effect_tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;execute_sql&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;process_refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;max_calls_per_tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_cost_per_run&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# In your agent loop:
&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;check_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;execute_sql&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DROP TABLE users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;PolicyAction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ALLOW&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="n"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;PolicyAction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ESCALATE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Route to human: tool requires approval or budget exceeded
&lt;/span&gt;    &lt;span class="nf"&gt;notify_human&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;escalation_packet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;PolicyAction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BLOCK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Tool was blocked: duplicate, quarantined, or over limit
&lt;/span&gt;    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;PolicyAction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CACHE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Idempotent replay: use the cached result, skip execution
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cached_result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No LLM calls. No network requests. Sub-millisecond. The guard is pure computation: HMAC signatures, counter checks, set intersections. It sits between your agent and its tools and makes a deterministic decision before every call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why max_steps isn't enough
&lt;/h2&gt;

&lt;p&gt;Every agent framework has some version of &lt;code&gt;max_steps&lt;/code&gt; or &lt;code&gt;max_iterations&lt;/code&gt;. LangGraph has &lt;code&gt;recursion_limit&lt;/code&gt;. CrewAI has &lt;code&gt;max_iter&lt;/code&gt;. OpenAI's Agents SDK has &lt;code&gt;max_turns&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;These are blunt stop buttons. They can't tell the difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An agent that made 20 productive tool calls and is almost done&lt;/li&gt;
&lt;li&gt;An agent that called the same failing API 20 times in a row&lt;/li&gt;
&lt;li&gt;An agent that executed a refund twice with slightly different wording&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Aura Guard detects the &lt;em&gt;specific failure mode&lt;/em&gt;. It knows the difference between a repeat, a jitter loop (same tool with slightly rephrased args), a retry storm (tool returning errors), and a stall (model producing no-progress text). Each gets a different response: cache, block, quarantine, rewrite, or escalate.&lt;/p&gt;

&lt;p&gt;A step counter would have stopped the Replit agent eventually, but only after the damage was done. A runtime governor would have stopped the &lt;em&gt;first destructive call&lt;/em&gt; from executing without approval.&lt;/p&gt;

&lt;h2&gt;
  
  
  The uncomfortable gap
&lt;/h2&gt;

&lt;p&gt;I looked for existing tools before building this. Guardrails AI is great for catching toxic outputs, but it doesn't operate at the tool-call level. It can't stop a loop. Portkey and LiteLLM handle cost limits and rate limiting at the API gateway, but they can't see inside the agent loop. They don't know if a tool call is a repeat or a side effect. LangGraph has &lt;code&gt;recursion_limit&lt;/code&gt;, CrewAI has &lt;code&gt;max_iter&lt;/code&gt;, but those are step counters, not diagnostics. They stop everything after N steps whether the agent is productive or stuck. Langfuse and LangSmith show you what happened after the fact. They watch; they don't enforce.&lt;/p&gt;

&lt;p&gt;I needed something that sits at the tool-call boundary, the moment between "the model wants to call this tool" and "the tool actually executes." That's what Aura Guard does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aura-guard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the built-in demo to see the primitives in action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aura-guard demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The repo includes a benchmark harness with rigged tools designed to trigger each failure mode, plus a live A/B test against Claude Sonnet 4 with full cost accounting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/auraguardhq/aura-guard" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://pypi.org/project/aura-guard/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you've dealt with the "agent rephrases the same query forever" problem, the "refund fired twice" problem, or the "retry storm against a failing API" problem, I'd like to hear what heuristics you use. My current jitter detection uses an overlap coefficient of 0.60 with a repeat threshold of 3. I'm sure there are better numbers. Open an issue or PR.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
