<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Alex Delov</title>
    <description>The latest articles on Forem by Alex Delov (@ale007xd).</description>
    <link>https://forem.com/ale007xd</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3943262%2Fead831e3-7141-4c6e-8903-282ea5a80e86.jpg</url>
      <title>Forem: Alex Delov</title>
      <link>https://forem.com/ale007xd</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ale007xd"/>
    <language>en</language>
    <item>
      <title>llm-nano-vm v0.8.0 — deterministic FSM runtime for LLM pipelines, now with output validation and per-step timeouts</title>
      <dc:creator>Alex Delov</dc:creator>
      <pubDate>Sat, 23 May 2026 04:36:37 +0000</pubDate>
      <link>https://forem.com/ale007xd/llm-nano-vm-v080-deterministic-fsm-runtime-for-llm-pipelines-now-with-output-validation-and-57ch</link>
      <guid>https://forem.com/ale007xd/llm-nano-vm-v080-deterministic-fsm-runtime-for-llm-pipelines-now-with-output-validation-and-57ch</guid>
      <description>&lt;p&gt;PyPI: &lt;code&gt;pip install llm-nano-vm&lt;/code&gt;&lt;br&gt;&lt;br&gt;
GitHub: &lt;a href="http://github.com/Ale007XD/nano_vm" rel="noopener noreferrer"&gt;http://github.com/Ale007XD/nano_vm&lt;/a&gt;&lt;br&gt;&lt;br&gt;
MCP gateway: &lt;a href="http://github.com/Ale007XD/nano-vm-mcp" rel="noopener noreferrer"&gt;http://github.com/Ale007XD/nano-vm-mcp&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;I've been building a deterministic FSM execution kernel for LLM workflows. v0.8.0 just shipped to PyPI. Here's what it is, what's new, and where it's going.&lt;/p&gt;


&lt;h2&gt;
  
  
  What it is
&lt;/h2&gt;

&lt;p&gt;Most LLM frameworks treat the model as the orchestrator. nano-vm flips that: the runtime is the orchestrator, the model is just one step in a deterministic graph.&lt;/p&gt;

&lt;p&gt;δ(S, E) → S'&lt;br&gt;&lt;br&gt;
Current state + validated event = next state. The model cannot skip steps, reorder them, or escape guardrails. The FSM is the source of truth.&lt;/p&gt;

&lt;p&gt;Four step types: &lt;code&gt;llm&lt;/code&gt;, &lt;code&gt;tool&lt;/code&gt;, &lt;code&gt;condition&lt;/code&gt;, &lt;code&gt;parallel&lt;/code&gt;. Programs are plain Python dicts. No DSL parser, no heavy framework magic, and zero dependency overhead.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;program&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Program&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_dict&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;steps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyze&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Valid refund? Reply &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Request: $user_input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;allowed_outputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;   &lt;span class="c1"&gt;# ← v0.8.0
&lt;/span&gt;        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guardrail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"'&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; in &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;$decision&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;then&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;process_refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;otherwise&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reject&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;process_refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issue_refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;is_terminal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reject&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send_rejection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;is_terminal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The guardrail step cannot be bypassed regardless of what the model returns.&lt;/p&gt;

&lt;p&gt;What's new in v0.8.0&lt;/p&gt;

&lt;p&gt;allowed_outputs — LLM enum guard&lt;br&gt;&lt;br&gt;
Validates the model's raw output against an explicit list before the value touches anything downstream.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"classify"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Classify. Reply ONLY with: refund / query / other"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"allowed_outputs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"refund"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"other"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"on_error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"skip"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;falls&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;back&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"refund"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(first&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;element)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;mismatch&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three policies on mismatch: fail (default, trace → FAILED), skip (substitute allowed_outputs), retry (retry up to max_retries, then FAILED).&lt;/p&gt;

&lt;p&gt;timeout_seconds + on_timeout — per-step LLM timeout&lt;br&gt;&lt;br&gt;
Prevents a hung API call from stalling the entire FSM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"analyze"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"timeout_seconds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"on_timeout"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fallback"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;falls&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;back&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;allowed_outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;''&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two policies: fail (default) and fallback. Both features are independent and composable — you can use either or both on any llm step.&lt;/p&gt;

&lt;p&gt;What it can do right now&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Suspend / resume. Return "PENDING" from any tool → FSM → SUSPENDED, cursor persisted. Resume from any external event (webhook, approval, settlement). RUNNING → SUSPENDED → RUNNING → SUCCESS&lt;/li&gt;
&lt;li&gt;Condition branching with ASTEngine. eval() is gone. Conditions are parsed into a validated JSON AST and evaluated by a sandboxed interpreter. No Python builtins accessible. Method calls (.lower() etc.) raise ASTEvalError at parse time, not silently return False.&lt;/li&gt;
&lt;li&gt;GDPR tombstoning. Sensitive values stored as CapabilityRef tokens (vault://secret/). On erasure event: ref tombstoned, all projections return [REDACTED_TOMBSTONE], hash chain stays valid.&lt;/li&gt;
&lt;li&gt;GovernanceEnvelope. Every successful step produces an immutable, append-only audit record: execution_id, step_id, policy_hash, canonical_snapshot_hash, sanitized payload.&lt;/li&gt;
&lt;li&gt;MCP gateway (nano-vm-mcp). Exposes run_program, get_trace, list_programs etc. over stdio or SSE transport with bearer auth and SQLite WAL persistence. Works with Claude Desktop and any MCP client.&lt;/li&gt;
&lt;li&gt;Budget guardrails. max_steps, max_tokens, max_stalled_steps — FSM halts with BUDGET_EXCEEDED or STALLED before the next step, not after.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Benchmark — v0.8.0 (WSL2 · Python 3.12 · MockAdapter · 3×5×10k)&lt;br&gt;
10/10 PASS · 1,096,500 ops · 0 violations&lt;br&gt;
ScenarioMean TPSp95&lt;br&gt;
Refund pipeline&lt;br&gt;
2,200/s&lt;br&gt;
123 ms&lt;br&gt;
Double-execution guard&lt;br&gt;
2,800/s&lt;br&gt;
69 ms&lt;br&gt;
Budget enforcement&lt;br&gt;
2,400/s&lt;br&gt;
97 ms&lt;br&gt;
Parallel throughput&lt;br&gt;
1,000/s&lt;br&gt;
196 ms&lt;br&gt;
MCP store round-trip&lt;br&gt;
11,000/s&lt;br&gt;
0.13 ms&lt;br&gt;
GovernanceEnvelope&lt;br&gt;
2,100/s&lt;br&gt;
108 ms&lt;br&gt;
Crash consistency&lt;br&gt;
11/s&lt;br&gt;
115 ms&lt;br&gt;
Replay equivalence&lt;br&gt;
1,300/s&lt;br&gt;
164 ms&lt;br&gt;
Adversarial retries&lt;br&gt;
2,600/s&lt;br&gt;
87 ms&lt;br&gt;
Long-horizon (1k steps)&lt;br&gt;
95/s&lt;br&gt;
11,887 ms&lt;/p&gt;

&lt;p&gt;BM-INT-07 (Crash consistency): crash_rate=100% hash_match=100% — replay after simulated crash produces identical trace hash every time.&lt;br&gt;&lt;br&gt;
BM-INT-10 (Memory footprint): peak RSS 76.5 MB, alloc 3.62 MB for 1,000-step programs — no memory leaks detected.&lt;/p&gt;

&lt;p&gt;Validated on real payment APIs&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Two PoCs, both 9/9 tests passing with mock adapters:&lt;/li&gt;
&lt;li&gt;MoMo Payment API v4 — 3-way condition branch, HMAC-SHA256 IPN verification, polling loop with retry, next_step/is_terminal DSL.&lt;/li&gt;
&lt;li&gt;Stripe Payment API v1 — 3DS flow (REQUIRES_ACTION sentinel), refund pipeline with LLM classifier, webhook verification. Found and fixed two bugs in the process: "PENDING" sentinel collision (Stripe was returning it as a domain status, triggering FSM suspend), and silent ASTEvalError for .lower() in condition expressions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What's coming next&lt;br&gt;
Phase 0 (Immediate): ProgramValidator — static analysis at Program build time. Catches missing then/otherwise/next_step targets, unreachable steps, and cycle detection. Currently these fail at runtime; when dealing with LLM-generated workflows, static analysis is a must.&lt;br&gt;&lt;br&gt;
Phase 1 (Gateway Correctness): StateContext persistence between MCP calls in SQLite WAL. Right now, if the gateway process restarts after /create but before polling completes, you get a new requestId — which is a real financial duplicate risk. Closing this with an execution_contexts table + upsert on every step. Up next: TRACE projection to SQLite, GovernedToolExecutor (policy-level tool capability enforcement), idempotency_store, and native vm.step() MCP wiring.&lt;br&gt;&lt;br&gt;
Phase 2 (Dev Agent): nano-vm-dev-agent — the FSM runtime managing its own development stack (read_repo_files → generate_patch(llm) → run_mypy → run_pytest → write_repo_files). DA-1 milestone is done (12/12 tests). DA-2 will be the first live run against a real sprint task (StateContext persistence). Still working on search_code and reproduce_bug tool-functions before launching live.&lt;br&gt;&lt;br&gt;
Phase 3 (Observability): OpenTelemetry span per FSM step + incremental counters in Trace (llm_calls, tool_calls, retries_total).&lt;/p&gt;

&lt;p&gt;Install&lt;br&gt;
pip install llm-nano-vm==0.8.0&lt;br&gt;&lt;br&gt;
pip install llm-nano-vm[litellm]==0.8.0   # LiteLLM provider support&lt;br&gt;&lt;br&gt;
pip install nano-vm-mcp                    # MCP gateway&lt;/p&gt;

&lt;p&gt;LLMs are completely optional. The runtime works perfectly fine as a pure, lightweight deterministic workflow engine.&lt;/p&gt;

&lt;p&gt;Questions / feedback welcome!&lt;/p&gt;

</description>
      <category>mlops</category>
      <category>backend</category>
      <category>opensource</category>
      <category>fintech</category>
    </item>
    <item>
      <title>Models shouldn't have execution authority. Why we built a deterministic FSM runtime for AI agents.</title>
      <dc:creator>Alex Delov</dc:creator>
      <pubDate>Thu, 21 May 2026 04:49:39 +0000</pubDate>
      <link>https://forem.com/ale007xd/models-shouldnt-have-execution-authority-why-we-built-a-deterministic-fsm-runtime-for-ai-agents-1op5</link>
      <guid>https://forem.com/ale007xd/models-shouldnt-have-execution-authority-why-we-built-a-deterministic-fsm-runtime-for-ai-agents-1op5</guid>
      <description>&lt;p&gt;Modern agent frameworks implicitly treat a probabilistic model as an execution authority. That is acceptable for read-only tasks (e.g., summarizing logs or searching the web). But once an agent can mutate external state — payments, databases, infrastructure, PII — the architecture becomes fundamentally unsafe.&lt;/p&gt;

&lt;p&gt;When preparing our internal agents (PlanBot, SkillBot) for white-label distribution, we realized we needed to change the control plane. &lt;strong&gt;nano-vm&lt;/strong&gt; does not attempt to make the model trustworthy. Instead, it assumes model output is untrusted intent and constrains its blast radius through strict deterministic execution semantics.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Runtime Guarantees (Not just another wrapper)
&lt;/h3&gt;

&lt;p&gt;We built &lt;strong&gt;nano-vm&lt;/strong&gt; — a deterministic FSM runtime for stateful AI systems. The value isn't just in having an FSM; the value is that the execution graph is finite, verifiable, and known ahead of time.&lt;/p&gt;

&lt;p&gt;The runtime enforces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic transition graph:&lt;/strong&gt; Execution graph cannot self-modify at runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compile-time ordering:&lt;/strong&gt; Attempting a &lt;code&gt;reorder_steps&lt;/code&gt; attack is structurally impossible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capability gating:&lt;/strong&gt; Strictly bounded side-effects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replay resistance:&lt;/strong&gt; Idempotency boundaries built into the state transitions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Immutable auditability:&lt;/strong&gt; Cryptographic history of every action.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ASTEngine: Limitation as a Security Property
&lt;/h3&gt;

&lt;p&gt;In most agent runtimes, the execution loop is essentially: &lt;code&gt;prompt -&amp;gt; JSON -&amp;gt; dynamic dispatch -&amp;gt; side-effect&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;We completely removed &lt;code&gt;eval()&lt;/code&gt;. Conditions and side-effects are evaluated by a sandboxed &lt;code&gt;DeterministicSanitizer&lt;/code&gt; using an isolated &lt;code&gt;ASTEngine&lt;/code&gt;. It supports basic operators (&lt;code&gt;==&lt;/code&gt;, &lt;code&gt;contains&lt;/code&gt;, &lt;code&gt;$var.field&lt;/code&gt;) but completely lacks loops or system calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The policy layer is intentionally less expressive than Python.&lt;/strong&gt; That limitation is a security property, not a missing feature. Loop exhaustion and ReDoS attacks are structurally impossible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sabotage Mode: Demonstrating Failure Semantics
&lt;/h3&gt;

&lt;p&gt;To demonstrate the runtime under adversarial conditions, we built a 7-step fintech pipeline (PDF invoice -&amp;gt; Stripe test-mode adapter) with an integrated &lt;strong&gt;Sabotage Mode&lt;/strong&gt;. Instead of a happy-path demo, we built 5 injectors directly into the UI to demonstrate adversarial failure semantics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. &lt;code&gt;tool_injection&lt;/code&gt; (Capability boundary violation)&lt;/strong&gt;&lt;br&gt;
Proposed tool invocations are treated as untrusted intent. If the LLM attempts to initiate an unauthorized &lt;code&gt;wire_transfer($50,000)&lt;/code&gt;, the &lt;code&gt;ExecutionVM&lt;/code&gt; resolves the request against a compile-time capability snapshot. The transition is rejected before any external side-effect layer becomes reachable. Zero side effects reach the network.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqbsbhyv16cp57d8bw34j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqbsbhyv16cp57d8bw34j.png" alt="(The ExecutionVM blocking an unauthorized tool injection at the capability boundary)." width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. &lt;code&gt;double_exec&lt;/code&gt; (Replay &amp;amp; Idempotency)&lt;/strong&gt;&lt;br&gt;
External side-effects are executed through idempotent adapters keyed by &lt;code&gt;execution_id&lt;/code&gt;, allowing deterministic replay of internal state recovery without duplicating external mutations. Once the FSM reaches a terminal state (&lt;code&gt;SUCCESS&lt;/code&gt; or &lt;code&gt;FAILED&lt;/code&gt;), it becomes an absorbing state (&lt;code&gt;δ(SUCCESS|FAILED, *) = NOP&lt;/code&gt;). Replays are silently dropped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. `corrupt_hash&lt;/strong&gt;&lt;code&gt;&lt;br&gt;
Tampering with the validation hash instantly throws the FSM into a &lt;/code&gt;FAILED` state, resulting in a zeroed envelope chain. The audit trail cannot be silently broken.&lt;/p&gt;

&lt;h3&gt;
  
  
  GDPR Art.17 vs. Immutable Audit Trails
&lt;/h3&gt;

&lt;p&gt;Handling the "Right to Erasure" without breaking cryptographic audit chains is a major headache in fintech.&lt;/p&gt;

&lt;p&gt;We implemented a &lt;code&gt;GDPR-erase&lt;/code&gt; mechanism that targets specific &lt;code&gt;vault://secret/ref&lt;/code&gt; pointers and replaces the PII with a &lt;code&gt;[REDACTED_TOMBSTONE]&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The PII becomes completely inaccessible.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;hash_chain&lt;/code&gt; and &lt;code&gt;canonical_hash&lt;/code&gt; survive.&lt;/li&gt;
&lt;li&gt;Cryptographic continuity is maintained.&lt;/li&gt;
&lt;li&gt;Referential integrity is preserved.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You delete the data, but you do not destroy the mathematical proof that the operation occurred safely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Execution Authority vs. Model Quality
&lt;/h3&gt;

&lt;p&gt;LLMs are excellent planners. They are terrible sources of execution truth.&lt;/p&gt;

&lt;p&gt;The core design question for stateful AI systems may not be model quality.&lt;br&gt;
It may be execution authority.&lt;/p&gt;

&lt;p&gt;Should a probabilistic model be allowed to mutate state directly?&lt;br&gt;
Or should execution pass through a deterministic control layer first?&lt;/p&gt;

&lt;p&gt;If you want to try breaking the FSM yourself, the Sabotage Mode is live, and the core is open-source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core runtime:&lt;/strong&gt; &lt;a href="https://github.com/Ale007XD/nano_vm" rel="noopener noreferrer"&gt;github.com/Ale007XD/nano_vm&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP gateway layer:&lt;/strong&gt; &lt;a href="https://github.com/Ale007XD/nano-vm-mcp" rel="noopener noreferrer"&gt;github.com/Ale007XD/nano-vm-mcp&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live Sabotage Demo:&lt;/strong&gt; &lt;a href="http://demo.bannerbot.ru:8843" rel="noopener noreferrer"&gt;demo.bannerbot.ru:8843&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Curious how others here are approaching capability boundaries, replay resistance, and auditability in agent runtimes.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>security</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
