Forem: Anthony Zender

The Execution Boundary Problem: What PocketOS Made Visible

Anthony Zender — Wed, 29 Apr 2026 22:43:11 +0000

The PocketOS incident last week gave it a name everyone could see. But this bug was already breaking systems quietly — payments, trades, scheduled jobs. Anywhere an AI agent retries a failed action without knowing if the first attempt completed.

The guardrail can't live inside the agent. It has to live outside, at the tool call boundary.

That's what SafeAgent does.

safe_execute(request_id, action, payload)

Same request_id always returns the original receipt. The side effect never fires twice. Works with any MCP host — Claude, Cursor, Windsurf.

I found this pattern building a live trading bot. Duplicate execution under retry is catastrophic when money is on the line.

@grok validated the OTEL exporter design on X and offered to help refine it. It shipped the same night.

pip install safeagent-exec-guard

Demo: azender1.github.io/SafeAgent/demo.html
GitHub: github.com/azender1/SafeAgent

I Was Building a Live Trading Bot and a Patented Wagering System. The Bug I Found Is Now Breaking AI Agents Everywhere.

Anthony Zender — Sun, 26 Apr 2026 06:15:26 +0000

This isn't a library I built to solve a theoretical problem.

It's a fix I built because real money was at risk.

The trading bot

I've been running a live QQQ/TQQQ momentum bot on Alpaca Markets. It reads 1-minute bars, scores market structure using VWAP, SMA8, SMA21, SMA34, and momentum signals, then enters leveraged positions in TQQQ (bull) or SQQQ (bear) based on that score.

The bot has retry logic built in. It has to — broker ACK timeouts are real. When you submit a market order and the network drops before confirmation comes back, you don't know if it filled or not. So the bot retries.

Here's the problem: if the first order actually filled but the confirmation timed out, the retry fires a second market order. On a 3x leveraged ETF, that's a doubled position you didn't intend. With real dollars on the line.

The bot already had a manual execution lock (EXECUTION_LOCK_SEC=15) and a JSON state machine to handle this. I built it by hand. It worked — mostly. But it was fragile, untested, and not something I'd want to hand to anyone else.

# The old pattern — retries up to 3 times
def place_order_with_retry(symbol, qty, side):
    last_err = None
    for attempt in range(1, EXIT_RETRY_COUNT + 1):
        try:
            return place_order(symbol, qty, side)  # fires twice if first timed out but filled
        except Exception as e:
            last_err = e
            time.sleep(EXIT_RETRY_SLEEP_SEC)
    raise last_err

That place_order call has no memory. If attempt 1 filled and attempt 2 fires, you now own twice the position. The broker doesn't know you didn't mean it.

The wagering system

At the same time I was building the bot, I was designing PeerPlay — a patented P2P wagering exchange for skill-based video game tournaments (USPTO provisional 63/914,036).

PeerPlay has an escrow engine, a verification layer, and a settlement layer. The verification layer uses AI to confirm match results. When a verification agent times out and retries, the settlement layer can receive two confirmation signals for the same match. Two signals → two prize payouts. One tournament result, two winner transfers.

The patent protects the architecture. Nothing in the patent protects you from your own execution layer firing twice.

Same problem. Different domain.

The extraction

I realized the trading bot and PeerPlay had identical failure modes:

Agent/bot decides to act
    ↓
Network times out
    ↓
Agent/bot retries
    ↓
Side effect fires twice

The fix in both cases is the same primitive: before you execute an irreversible action, check whether it already ran. If it did, return the original result. If it didn't, run it and store the result.

That's SafeAgent.

from settlement.settlement_requests import SettlementRequestRegistry

registry = SettlementRequestRegistry()

# Same request_id on retry → returns original receipt, never re-executes
receipt = registry.execute(
    request_id="trade:TQQQ:buy:2026-04-26T09:47:00",
    action="order_buy_TQQQ",
    payload={"symbol": "TQQQ", "qty": 10, "side": "buy"},
    execute_fn=lambda: place_order("TQQQ", 10, "buy"),
)

First call executes the order and stores the receipt. Any retry with the same request_id returns the stored receipt — the broker is never called again.

Why this matters for AI agents specifically

The trading bot and PeerPlay are deterministic systems. They have retry logic because networks are unreliable. AI agents have the same problem but worse — they also have uncertain completion signals.

When Claude or any LLM agent calls a tool, it may:

Get a timeout and retry the same call
Receive an ambiguous response and call again to confirm
Run in a loop and re-trigger the same action
Get restarted mid-execution and replay from the last checkpoint

Every one of these scenarios can produce duplicate side effects. The agent frameworks (LangChain, CrewAI, n8n, OpenAI function calling) handle retries at the transport layer. None of them track whether the side effect already happened.

That gap — between the agent decision and the irreversible action — is where SafeAgent lives.

The state machine

SafeAgent doesn't just deduplicate by request_id. It enforces a finality gate:

OPEN → RESOLVED → IN_RECONCILIATION → FINAL → SETTLED

Execution is only permitted from FINAL. If the agent's signals are ambiguous — conflicting tool responses, partial confirmations, uncertain outcomes — the state stays in IN_RECONCILIATION and the side effect is blocked until the outcome is clear.

This is what I needed for PeerPlay's verification layer. The AI model returns a confidence score. SafeAgent holds the settlement until that score clears a threshold. Below threshold: IN_RECONCILIATION. Above threshold: FINAL. Payout executes exactly once.

Where it fits in the MCP stack

If you're building agents on MCP, SafeAgent sits above your tool layer:

Claude / agent decision
    → SafeAgent finality gate
    → SafeAgent request-id dedup
    → MCP tool executes
    → Receipt stored (SQLite, survives restarts)

It works with any MCP-capable host — Claude, Cursor, Windsurf, custom executors — without modifying the protocol.

As of today (April 26, 2026) SafeAgent is officially listed in the MCP registry:

io.github.azender1/safeagent v0.1.14
registry.modelcontextprotocol.io

Install

pip install safeagent-exec-guard

Python 3.10+ · Apache-2.0 · GitHub · Live demo

The trading bot integration example is in the repo at examples/safeagent_trading_integration.py — it shows the before/after pattern with real variable names from the QQQ bot.

The audit

If you're running agents or bots in production and want to know where your system can execute twice, I'm offering a focused duplicate execution risk audit for $499. Written report, every retry path, every side effect boundary, SafeAgent integration recommendations.

DM me or email azender1@yahoo.com.

Built by Anthony Zender, Dayton OH. Payroll tax accountant by day, agent infrastructure builder by night. USPTO provisional 63/914,036 — Zender Gaming Technologies LLC.

The Real AI Agent Failure Mode Is Uncertain Completion

Anthony Zender — Sat, 28 Mar 2026 14:12:46 +0000

The Real AI Agent Failure Mode Is Uncertain Completion

A lot of AI agent discussion focuses on the wrong failure modes.

People talk about:

hallucinations
prompt injection
tool misuse
runaway loops
bad reasoning

Those are real.

But once an agent starts calling tools that affect the outside world, a different class of failure becomes much more dangerous:

uncertain completion

That is the moment where the system cannot confidently answer:

“Did this action already happen?”

And once that question becomes ambiguous, retries get dangerous very fast.

What uncertain completion actually looks like

A common real-world path looks like this:

agent decides to call send_payment()
→ tool sends the payment request
→ timeout / crash / disconnect / lost response
→ caller does not know if it succeeded
→ retry happens
→ payment may be sent again

The same thing shows up with:

order creation
booking flows
email sends
CRM mutations
support ticket creation
browser / UI automation
webhook-triggered workflows

The model may have made the correct decision.

The failure is that the system has no durable way to prove whether the side effect already happened.

This is not mainly a prompting problem

The agent is often not “being stupid.”

The system is simply missing a clean execution boundary.

That means:

the same logical action can be attempted multiple times
the caller cannot distinguish “attempted” from “completed”
retries are forced to guess

And “guessing” is exactly how you get:

duplicate payments
duplicate emails
duplicate orders
duplicate API mutations
duplicate irreversible actions
The hidden trap: “we logged the attempt”

A lot of systems record that they tried to do something.

That is not the same as recording that it completed safely.

This is where the distinction matters:

State visibility

Can your system durably see:

what was requested
what was claimed
what actually completed
what result should be returned on replay
Result recovery

If the side effect happened but the response was lost, can the system reconstruct what should happen next without re-executing the side effect?

That second part is where many systems break.

Because once the answer becomes:

“we’re not sure, so retry it”

you are already in dangerous territory.

API idempotency helps — but it is not enough

A common response is:

“Just use idempotency keys.”

That is often correct.

And if the downstream API supports strong idempotency semantics, you should absolutely use them.

But that still leaves hard cases:

the downstream API does not support idempotency
the key is not stable across retries
the first call may have succeeded but the caller cannot prove it
the side effect is happening in a browser / UI / desktop automation context
the external system gives weak or ambiguous feedback

In those cases, the problem is no longer just API-level idempotency.

It becomes:

execution-layer safety
The important split: intent vs execution

One of the cleanest ways to think about this is:

the agent should not directly own irreversible side effects

Instead, there should be a separation between:

Agent intent

“I think we should do X”

and

Execution

“X is now allowed to happen exactly once”

That is a very important boundary.

Because once the system separates:

decision
validation
execution
receipt / replay

…then retries stop being so dangerous.

A better pattern: proposal → guard → execute

A safer structure looks more like this:

agent proposes action
→ deterministic layer validates action
→ execution guard checks durable receipt
→ if already completed: return prior result
→ else: execute once and persist receipt

This is a very different mental model from:

agent decides
→ immediately call side-effecting tool

That second pattern is where a lot of production agent systems get into trouble.

The more irreversible the action, the thicker the boundary

Not all tools should be treated equally.

A useful mental model is:

Safe tools

Examples:

search
read_file
summarize
fetch_status

These are usually fine to retry.

Side-effecting tools

Examples:

send_email
create_order
create_ticket
update_CRM

These need an execution boundary.

Irreversible / high-risk tools

Examples:

payment
delete
trade execution
account mutation

These need the strongest boundary:

deterministic identity
durable receipts
replay-safe semantics
often confirmation / policy checks

The principle is simple:

the more irreversible the action, the thicker the execution boundary should be
What systems actually need

In practice, most systems need some combination of:

stable request / operation identity
durable receipt storage
replay-safe execution semantics
result recovery
explicit separation between “propose” and “execute”

That can be implemented many ways.

But the important thing is the architectural boundary itself.

Because once a system can confidently answer:

“yes, this already happened”

then retries become much safer.

Why this keeps showing up in agent systems

Traditional systems already had this problem.

Agents just make it more visible.

Why?

Because agents are:

retry-heavy
tool-using
asynchronous
failure-prone
often layered on top of APIs that were never designed for autonomous replay

So the moment an agent starts touching:

payments
orders
emails
browser actions
external systems

…uncertain completion becomes one of the most important production problems in the stack.

Closing thought

The scariest agent failure is often not:

“the model made the wrong choice”

It is:

“the model made the right choice twice”

And the reason that happens is usually not intelligence failure.

It is:

missing execution boundaries under uncertain completion
Related

I wrote a first piece on the execution-side pattern here:

The Execution Guard Pattern for AI Agents
https://dev.to/azender1/the-execution-guard-pattern-for-ai-agents-23m9

And I’m also building a Python reference implementation around this idea:

GitHub
https://github.com/azender1/SafeAgent

The Execution Guard Pattern for AI Agents

Anthony Zender — Sat, 28 Mar 2026 02:04:36 +0000

AI agents don’t just think — they execute real-world actions.

Payments. Trades. Emails. API calls.

And under retries, timeouts, or crashes…

they can execute the same action twice.

Not because the model was wrong —
because the system has no memory of execution.

The hidden failure mode

A typical failure path looks like this:

agent decides to call tool
→ tool executes side effect
→ response is lost (timeout / crash / disconnect)
→ system retries
→ side effect executes again

Now you have:

duplicate payments
duplicate trades
duplicate emails
duplicate API mutations

Not because the decision was wrong —
because the execution layer has no durable receipt.

Retries are correct — and still dangerous

Retries are necessary for reliability.

But retries + irreversible side effects without a guard = replay risk.

The system cannot confidently answer:

“Did this action already happen?”

So it does the only thing it can:

→ tries again

That’s fine for reads.

It’s dangerous for writes.

The Execution Guard Pattern

The fix is not prompt engineering.

It’s an execution boundary around side effects.

Pattern:
decision
→ deterministic request_id
→ execution guard
→ if receipt exists → return prior result
→ else → execute once → store receipt

Instead of asking the model to “be careful,”
the system itself becomes replay-safe.

The four required properties

For this pattern to work, you need four things:

1) Deterministic request identity

Every logical action must map to the same request_id across retries.

If the same payment, email, trade, or tool call is retried, it must resolve to the same identity.

2) Durable receipt storage

You need a place to persist what happened.

Postgres works well for this because it gives you:

durable writes
transactional boundaries
strong uniqueness guarantees
queryable auditability

Without durable receipts, retries are guesswork.

3) Atomic claim → execute → complete boundary

The system needs a clear execution boundary:

claim the operation
execute the side effect once
persist the result / receipt

That boundary is what prevents:

concurrent replays
duplicate workers
race-condition duplicates
“two consumers did the same thing” bugs

4) Replay returns the prior result

If the same logical action comes in again,
you should not execute it again.

You should return the prior result.

That turns:

retries
redelivery
replay
uncertain completion

into:

safe re-entry instead of duplicate side effects
What this is NOT

This is not:

moderation
prompt safety
RBAC
approval workflows
hallucination prevention

It solves one thing:

“Did this irreversible action already happen?”

That question shows up everywhere once agents or automations start calling real tools.

Where this matters most

This pattern matters anywhere your system causes real-world side effects:

webhook handlers
billing / payment flows
async workers / queues
workflow / automation systems
AI agent tool calls
external API mutations
order / booking / ticket creation
notifications and email sends

In other words:

anything that should happen once, even if the system retries
Why this keeps showing up

Modern systems are:

distributed
async
retry-heavy
failure-prone
full of uncertain completion

So “exactly once” does not happen naturally.

You have to build it explicitly.

And once you add:

AI agents
autonomous workflows
tool-calling systems

…the need for an execution boundary gets even sharper.

Because now a model can repeatedly decide to invoke something that has real-world consequences.

A practical implementation direction

In many systems, this can be implemented with:

a Postgres-backed receipt table
a stable operation / request ID
a guard layer around side-effecting functions

That turns:

unsafe retries

into:

safe replays

This doesn’t require rewriting your whole system.

It usually means identifying the small set of functions that can cause irreversible side effects and wrapping them with a durable execution boundary.

That’s where the leverage is.

Closing thought

If an AI agent can call tools,
it needs more than reasoning.

It needs execution memory.

Otherwise:

retries will eventually execute something twice.
Execution Risk Audit

I’m currently looking at systems where retries, webhooks, workers, workflows, or AI agents can replay irreversible actions.

If your system has paths where you can’t confidently answer:

“Did this action already happen?”

that’s exactly the kind of problem I’m focused on.

Especially interested in:

duplicate webhook execution
retry-safe billing flows
workflow steps with uncertain completion
AI agents calling side-effecting tools