<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Shudipto Trafder</title>
    <description>The latest articles on Forem by Shudipto Trafder (@shudiptotrafder).</description>
    <link>https://forem.com/shudiptotrafder</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3681541%2Fd686de74-59f1-4097-9bf4-81b3837b0aba.jpg</url>
      <title>Forem: Shudipto Trafder</title>
      <link>https://forem.com/shudiptotrafder</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shudiptotrafder"/>
    <language>en</language>
    <item>
      <title>AgentFlow — From Agent Code to Production API in Minutes</title>
      <dc:creator>Shudipto Trafder</dc:creator>
      <pubDate>Sun, 03 May 2026 17:09:58 +0000</pubDate>
      <link>https://forem.com/10xscale/agentflow-from-agent-code-to-production-api-in-minutes-p3e</link>
      <guid>https://forem.com/10xscale/agentflow-from-agent-code-to-production-api-in-minutes-p3e</guid>
      <description>&lt;h2&gt;
  
  
  AgentFlow — The Python Framework for Production AI Agents
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Stop rebuilding the same agent infrastructure. AgentFlow gives you auth, streaming, persistence, and a React frontend — out of the box.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AgentFlow (&lt;code&gt;10xscale-agentflow&lt;/code&gt; on PyPI) is an open-source Python framework for building and deploying multi-agent AI systems. Write your agent graph once. Run it locally. Ship it to production without rewriting your backend.&lt;/p&gt;

&lt;p&gt;Built by &lt;a href="https://10xscale.ai/" rel="noopener noreferrer"&gt;10xScale&lt;/a&gt;. MIT licensed. No vendor lock-in.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AgentFlow?
&lt;/h2&gt;

&lt;p&gt;Most agent frameworks stop at the prototype. You get a cute demo, then spend weeks bolting on auth, rate limiting, persistence, and a frontend. AgentFlow is built for what comes after the demo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One framework. From first &lt;code&gt;pip install&lt;/code&gt; to production Docker deploy.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🔗 Links
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core Python Library&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/10xHub/Agentflow" rel="noopener noreferrer"&gt;github.com/10xHub/Agentflow&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API &amp;amp; CLI&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/10xHub/agentflow-cli" rel="noopener noreferrer"&gt;github.com/10xHub/agentflow-cli&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documentation&lt;/td&gt;
&lt;td&gt;&lt;a href="https://10xhub.github.io/agentflow-docs" rel="noopener noreferrer"&gt;10xhub.github.io/agentflow-docs&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyPI — Core&lt;/td&gt;
&lt;td&gt;&lt;a href="https://pypi.org/project/10xscale-agentflow/" rel="noopener noreferrer"&gt;pypi.org/project/10xscale-agentflow&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyPI — CLI&lt;/td&gt;
&lt;td&gt;&lt;a href="https://pypi.org/project/10xscale-agentflow-cli/" rel="noopener noreferrer"&gt;pypi.org/project/10xscale-agentflow-cli&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Full Stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agentflow            →  Core Python orchestration engine
agentflow-cli        →  FastAPI server + CLI tooling
agentflow-client     →  TypeScript/React SDK (@10xscale/agentflow-client)
agentflow-playground →  Hosted UI for testing agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use any layer alone. Use them together for a complete AI product stack — from LLM call to browser UI — without stitching four different libraries together.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get Running in 60 Seconds
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow-cli

agentflow init   &lt;span class="c"&gt;# scaffold a new project&lt;/span&gt;
agentflow api    &lt;span class="c"&gt;# start the dev server&lt;/span&gt;
agentflow play   &lt;span class="c"&gt;# open the playground UI&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Your agent is running, streamed, and explorable in under a minute.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You Get
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Graph-Based Agent Orchestration
&lt;/h3&gt;

&lt;p&gt;AgentFlow uses a &lt;code&gt;StateGraph&lt;/code&gt; — directed nodes, conditional edges, and full control over execution flow. No black boxes. No magic routing you can't debug.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentflow.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolNode&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentflow.state&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentflow.utils.constants&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get weather for a location.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The weather in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; is sunny, 72°F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MAIN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini/gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;tool_node_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ToolNode&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;tools_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;

&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MAIN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MAIN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MAIN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Weather in NYC?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]},&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stateful. Tool-calling. Under 30 lines.&lt;/p&gt;




&lt;h3&gt;
  
  
  LLM-Agnostic
&lt;/h3&gt;

&lt;p&gt;Pass the model string. AgentFlow routes it.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI (GPT-4o, o3, etc.)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pip install openai&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Gemini + Vertex AI&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pip install google-genai&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic Claude&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pip install anthropic&lt;/code&gt; &lt;em&gt;(coming soon)&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No provider-specific abstractions to learn. Swap models without touching your agent logic.&lt;/p&gt;




&lt;h3&gt;
  
  
  Parallel Tool Execution — Automatic
&lt;/h3&gt;

&lt;p&gt;When an LLM calls multiple tools at once, AgentFlow runs them concurrently. No config required.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Other frameworks:  1.0s + 1.5s + 0.8s = 3.3s
AgentFlow:         max(1.0s, 1.5s, 0.8s) = 1.5s  ⚡ 2.2x faster
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Production Memory — Three Layers
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Working Memory    →  Current execution state (AgentState)
Session Memory    →  Redis (hot) + PostgreSQL (durable) checkpointer
Knowledge Memory  →  Qdrant vector store + Mem0 semantic recall
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Redis keeps hot conversation state fast. PostgreSQL keeps it durable and horizontally scalable. Both run together — you don't pick one.&lt;/p&gt;




&lt;h3&gt;
  
  
  Streaming
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stream_gen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;astream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;inp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_granularity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ResponseGranularity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LOW&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream_gen&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three granularity levels: token-by-token (ChatGPT-style), message-by-message, or node-by-node graph traces. Your frontend decides what to show.&lt;/p&gt;




&lt;h3&gt;
  
  
  Auth and Security — Built In, Not Bolted On
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Most frameworks leave auth as an exercise for the reader. AgentFlow ships it.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"auth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"jwt"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"auth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"auth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"custom"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auth.my_backend:MyAuth"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line in &lt;code&gt;agentflow.json&lt;/code&gt;. Switch from dev to production auth without touching your graph code.&lt;/p&gt;

&lt;p&gt;Security features included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JWT authentication with configurable secrets&lt;/li&gt;
&lt;li&gt;Custom auth backends for OAuth2, API keys, and sessions&lt;/li&gt;
&lt;li&gt;Role-Based Access Control (RBAC)&lt;/li&gt;
&lt;li&gt;Sliding-window rate limiting (memory or Redis backends)&lt;/li&gt;
&lt;li&gt;Configurable request size limits (DoS protection, default 10 MB)&lt;/li&gt;
&lt;li&gt;Auto-redaction of tokens and secrets from logs&lt;/li&gt;
&lt;li&gt;Startup validation — warns about insecure CORS and debug mode before you accidentally deploy them&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Lifecycle Callbacks
&lt;/h3&gt;

&lt;p&gt;Hook into every layer of execution — before and after each LLM call, tool call, or MCP invocation. Hook into the graph itself for start, end, checkpoint, interrupt, resume, and error events.&lt;/p&gt;

&lt;p&gt;Use them for audit logs, billing meters, policy enforcement, prompt-injection checks, or any business logic that shouldn't live inside the prompt.&lt;/p&gt;




&lt;h3&gt;
  
  
  The CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentflow init              &lt;span class="c"&gt;# scaffold project + config&lt;/span&gt;
agentflow api               &lt;span class="c"&gt;# dev server with auto-reload&lt;/span&gt;
agentflow play              &lt;span class="c"&gt;# open playground against local backend&lt;/span&gt;
agentflow build &lt;span class="nt"&gt;--docker-compose&lt;/span&gt;  &lt;span class="c"&gt;# generate Dockerfile + compose&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Auto-generated FastAPI endpoints:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/invoke&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;POST — synchronous agent call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/stream&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;POST — streaming agent call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/threads&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GET — list conversation threads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/threads/{id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GET — fetch thread history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/threads/{id}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;DELETE — delete thread&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Your agent graph becomes a production API. No FastAPI boilerplate to write.&lt;/p&gt;




&lt;h3&gt;
  
  
  Dependency Injection with InjectQ
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentflow.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Inject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UserService&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get weather for a location.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Weather for user &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: sunny&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean tools. Testable tools. Per-request context without global state.&lt;/p&gt;




&lt;h3&gt;
  
  
  Human-in-the-Loop
&lt;/h3&gt;

&lt;p&gt;Pause execution mid-graph. Inject a human decision. Resume with full state intact. No re-running prior steps.&lt;/p&gt;

&lt;p&gt;Approval workflows, moderation gates, interactive debugging — all supported without custom state management.&lt;/p&gt;




&lt;h3&gt;
  
  
  Event Publishing
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Publisher&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Redis Pub/Sub&lt;/td&gt;
&lt;td&gt;Lightweight in-process distribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka&lt;/td&gt;
&lt;td&gt;High-volume event streaming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RabbitMQ&lt;/td&gt;
&lt;td&gt;Reliable queuing, distributed systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Console&lt;/td&gt;
&lt;td&gt;Local debugging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;Any backend you want&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  React/TypeScript Client SDK
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;@10xscale/agentflow-client&lt;/code&gt; gives you React hooks (&lt;code&gt;useAgent&lt;/code&gt;, &lt;code&gt;useStream&lt;/code&gt;, &lt;code&gt;useThreads&lt;/code&gt;), token-level streaming for ChatGPT-style UIs, and client-side tool execution. The frontend talks to your AgentFlow API without custom integration code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Feature Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;AgentFlow&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;LangGraph&lt;/th&gt;
&lt;th&gt;CrewAI&lt;/th&gt;
&lt;th&gt;AutoGen&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Architecture&lt;/td&gt;
&lt;td&gt;Graph&lt;/td&gt;
&lt;td&gt;Graph&lt;/td&gt;
&lt;td&gt;Role-Based&lt;/td&gt;
&lt;td&gt;Conversational&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full Stack (Backend + Frontend SDK)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallel Tool Execution&lt;/td&gt;
&lt;td&gt;✅ Auto&lt;/td&gt;
&lt;td&gt;⚠️ Config&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persistence&lt;/td&gt;
&lt;td&gt;✅ Redis + Postgres&lt;/td&gt;
&lt;td&gt;⚠️ Postgres/SQLite&lt;/td&gt;
&lt;td&gt;⚠️ Local&lt;/td&gt;
&lt;td&gt;⚠️ Local&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependency Injection&lt;/td&gt;
&lt;td&gt;✅ Native&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI + Docker Deployment&lt;/td&gt;
&lt;td&gt;✅ One command&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth Built-In&lt;/td&gt;
&lt;td&gt;✅ JWT + Custom&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate Limiting&lt;/td&gt;
&lt;td&gt;✅ Memory + Redis&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lifecycle Callbacks&lt;/td&gt;
&lt;td&gt;✅ Full&lt;/td&gt;
&lt;td&gt;⚠️ Manual&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;⚠️ Manual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP Support&lt;/td&gt;
&lt;td&gt;✅ Native&lt;/td&gt;
&lt;td&gt;⚠️ Partial&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Event Publishing&lt;/td&gt;
&lt;td&gt;✅ Kafka/Redis/AMQP&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open Source (MIT)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Core library&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow

&lt;span class="c"&gt;# Full CLI + API server&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Optional extras:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow[pg_checkpoint]   &lt;span class="c"&gt;# PostgreSQL + Redis persistence&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow[mcp]             &lt;span class="c"&gt;# Model Context Protocol&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow[google-genai]    &lt;span class="c"&gt;# Google GenAI adapter&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow[kafka]           &lt;span class="c"&gt;# Kafka event publishing&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;10xscale-agentflow[redis]           &lt;span class="c"&gt;# Redis publisher + rate limiting&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Current Version
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;10xscale-agentflow&lt;/code&gt; (core)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;v0.7.4&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;10xscale-agentflow-cli&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;v0.3.2&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Added in v0.7.x:&lt;/strong&gt; multimodal support (images, audio, video), extended reasoning / chain-of-thought, 3-layer memory, callback and lifecycle hooks, agent skills, Vertex AI support, structured Pydantic outputs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Roadmap
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;✅ Graph engine with nodes, edges, and conditional routing&lt;/li&gt;
&lt;li&gt;✅ Redis + PostgreSQL state checkpointing&lt;/li&gt;
&lt;li&gt;✅ Tool integration — local Python, MCP, optional adapters&lt;/li&gt;
&lt;li&gt;✅ Parallel tool execution&lt;/li&gt;
&lt;li&gt;✅ Lifecycle callbacks and graph hooks&lt;/li&gt;
&lt;li&gt;✅ Streaming + event publishing&lt;/li&gt;
&lt;li&gt;✅ Human-in-the-loop&lt;/li&gt;
&lt;li&gt;✅ Multimodal agents&lt;/li&gt;
&lt;li&gt;🚧 Remote node execution for distributed processing&lt;/li&gt;
&lt;li&gt;🚧 OpenTelemetry tracing&lt;/li&gt;
&lt;li&gt;🚧 More persistence backends (DynamoDB, etc.)&lt;/li&gt;
&lt;li&gt;🚧 Visual graph editor&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Privacy and Licensing
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MIT License&lt;/strong&gt; — use freely in commercial products&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No data collection&lt;/strong&gt; — your conversations and agent data stay on your infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No per-call billing&lt;/strong&gt; — you pay for your LLM API and infra, not our licensing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy anywhere&lt;/strong&gt; — Docker, Kubernetes, AWS ECS, Cloud Run, Azure, Heroku&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core Library&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/10xHub/Agentflow" rel="noopener noreferrer"&gt;https://github.com/10xHub/Agentflow&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API &amp;amp; CLI&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/10xHub/agentflow-cli" rel="noopener noreferrer"&gt;https://github.com/10xHub/agentflow-cli&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documentation&lt;/td&gt;
&lt;td&gt;&lt;a href="https://10xhub.github.io/agentflow-docs" rel="noopener noreferrer"&gt;https://10xhub.github.io/agentflow-docs&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyPI Core&lt;/td&gt;
&lt;td&gt;&lt;a href="https://pypi.org/project/10xscale-agentflow/" rel="noopener noreferrer"&gt;https://pypi.org/project/10xscale-agentflow/&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyPI CLI&lt;/td&gt;
&lt;td&gt;&lt;a href="https://pypi.org/project/10xscale-agentflow-cli/" rel="noopener noreferrer"&gt;https://pypi.org/project/10xscale-agentflow-cli/&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Issues &amp;amp; Requests&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/10xHub/Agentflow/issues" rel="noopener noreferrer"&gt;https://github.com/10xHub/Agentflow/issues&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Discussions&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/10xHub/Agentflow/discussions" rel="noopener noreferrer"&gt;https://github.com/10xHub/Agentflow/discussions&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;a href="https://10xscale.ai/" rel="noopener noreferrer"&gt;10xScale&lt;/a&gt; and the community. MIT licensed.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
      <category>langgraph</category>
    </item>
    <item>
      <title>TOON for LLMs: A Benchmark Performance Analysis</title>
      <dc:creator>Shudipto Trafder</dc:creator>
      <pubDate>Sat, 27 Dec 2025 15:36:50 +0000</pubDate>
      <link>https://forem.com/shudiptotrafder/toon-for-llms-a-comparative-performance-analysis-against-json-52am</link>
      <guid>https://forem.com/shudiptotrafder/toon-for-llms-a-comparative-performance-analysis-against-json-52am</guid>
      <description>&lt;p&gt;Every API call you make with JSON is costing you more than you think.&lt;/p&gt;

&lt;p&gt;I ran real-world extractions using Gemini 2.5 Flash, and the results were startling: JSON consistently used 30–40% more output tokens than TOON format. In one test, JSON consumed 471 output tokens while TOON used just 227 — a 51% reduction.&lt;/p&gt;

&lt;p&gt;But here’s where it gets interesting: &lt;strong&gt;TOON initially failed 70% of the time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After optimization, I achieved 100% parsing success and discovered something counterintuitive — it uses more prompt tokens, with TOON actually saves you money overall. When I tested structured outputs with Pydantic models, JSON required 389 output tokens versus TOON’s simpler encoding.&lt;/p&gt;

&lt;p&gt;The hidden goldmine? &lt;strong&gt;Tool/function calling.&lt;/strong&gt; That’s where TOON’s compact format shines brightest, slashing token costs in agentic workflows where responses become the next prompt.&lt;/p&gt;

&lt;p&gt;This isn’t theoretical. I’m sharing the actual prompts, parsing errors, token counts, and code that took TOON from a 70% failure rate to production-ready. Whether TOON beats JSON depends on your use case — and I have the data to prove exactly when.&lt;/p&gt;

&lt;p&gt;Let’s break down the numbers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Experiment #1: The Initial TOON Failure (70% Success Rate)
&lt;/h2&gt;

&lt;p&gt;I started with what seemed like a straightforward test: extracting structured job description data using TOON instead of JSON.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Setup:
&lt;/h3&gt;

&lt;p&gt;My prompt was simple — ask Gemini 2.5 Flash to extract role, skills, experience, location, and responsibilities from a job posting. For the output format, I did what seemed logical: I showed TOON’s encoded structure using the actual output format (essentially a drop-in replacement approach).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract Role, Primary Skills, Secondary Skills,
Minimum Experience, Maximum Experience,
Location, Employment Type, Summary, and Responsibilities

Job Description:
&amp;lt;JD Text&amp;gt;

Output in TOON format:

Role: ""
"Primary Skills"[2]: Python,JavaScript
"Secondary Skills"[2]: Responsibility,Communication
"Minimum Experience": ""
"Maximum Experience": ""
Location: ""
"Employment Type": ""
Summary: ""
Responsibilities[2]: Task A,Task B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what I suspected would work: By showing the encoded format with empty strings and generic placeholders, the model would understand the structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reality check: 70% failure rate.&lt;/strong&gt;&lt;br&gt;
The errors were telling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Error parsing TOON format for JD#2: Expected 10 values, but got 16&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Error parsing TOON format for JD#5: Missing colon after key&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model was confused about arrays. Sometimes it outputs &lt;code&gt;Skills: Python, JavaScript, React&lt;/code&gt; as a flat string. Other times, it attempted brackets but malformed the syntax.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The hypothesis:&lt;/strong&gt; Maybe showing encoded/empty examples was the problem. The model needed to see real data patterns, especially for arrays.&lt;/p&gt;
&lt;h3&gt;
  
  
  Token Usage (Failed Attempts, 70% Success Rate):
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt:&lt;/strong&gt; 729 tokens&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output:&lt;/strong&gt; 227 tokens&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Success Rate:&lt;/strong&gt; ~30% initially, improved to 70% after adding two real examples with populated arrays&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Json Token Usages:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt:&lt;/strong&gt; 723 tokens&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output:&lt;/strong&gt; 471 tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Insight:&lt;/strong&gt;&lt;br&gt;
TOON's compact syntax is unforgiving. JSON has redundancy (&lt;code&gt;{"key": "value"}&lt;/code&gt;) that helps models self-correct. TOON's &lt;code&gt;Key: value&lt;/code&gt; format offers no such safety net. The model needed concrete examples, not abstract templates.&lt;/p&gt;

&lt;p&gt;But 70% wasn't good enough for production. Time to fix this properly.&lt;/p&gt;


&lt;h2&gt;
  
  
  Experiment #2: Achieving 100% Parsing Success (And the Token Trade-off)
&lt;/h2&gt;

&lt;p&gt;I needed to fix the 70% success rate. The solution? Stop being minimalist with examples.&lt;/p&gt;

&lt;p&gt;Instead of showing encoded/empty structures, I gave the model a complete, realistic example with proper TOON formatting — especially for arrays.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Revised Prompt:
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract Role, Primary Skills, Secondary Skills,
Minimum Experience, Maximum Experience,
Location, Employment Type, Summary, and Responsibilities

Job Description:
&amp;lt;JD Text&amp;gt;

Output in TOON format. Example structure:

Role: "Senior Data Scientist"
Primary_Skills:

 [1]: "Machine Learning"
 [2]: "Statistical Analysis"
Secondary_Skills:
 [0]: "Big Data"
 [1]: "Cloud Platforms"
Minimum_Experience: "5 years"
Maximum_Experience: "10 years"
Location: "New York, NY or Remote"
Employment_Type: "Full-time"
Summary: "Lead data science initiatives"
Responsibilities:
 [0]: "Design ML models"
 [1]: "Analyze datasets"


Now provide the extraction in TOON format. Keep the format exactly
as shown above.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; 100% parsing. No more malformed arrays. No more missing colons.&lt;/p&gt;

&lt;p&gt;But here's the catch—the prompt got heavier.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Token Comparison: TOON vs JSON
&lt;/h3&gt;

&lt;p&gt;Let me show you the actual numbers across the same 10 job descriptions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JSON Approach: Token Usage&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt tokens:&lt;/strong&gt; 723&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output tokens:&lt;/strong&gt; 471&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Success rate:&lt;/strong&gt; 100% (JSON is forgiving)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;TOON Approach (Initial — 70% success)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt tokens:&lt;/strong&gt; 729&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output tokens:&lt;/strong&gt; 227 ✅ (51.8% reduction vs JSON)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Total:&lt;/strong&gt; 956 tokens (saves 238 tokens per request)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Success rate:&lt;/strong&gt; 70% ❌&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;TOON Approach (Optimized — 100% success)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt tokens:&lt;/strong&gt; 802 ❌ (+11% vs JSON)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output tokens:&lt;/strong&gt; 455 ✅ (3.4% reduction vs JSON)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Success rate:&lt;/strong&gt; 100% ✅&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  The Uncomfortable Truth
&lt;/h2&gt;

&lt;p&gt;For basic extraction tasks, optimized TOON costs MORE than JSON.&lt;/p&gt;

&lt;p&gt;Yes, the output is slightly more compact (455 vs 471 tokens), but the verbose prompting needed to achieve 100% reliability completely erases any savings. In fact, you’re paying 5% more per request.&lt;/p&gt;

&lt;p&gt;So why am I still testing TOON?&lt;/p&gt;

&lt;p&gt;Because this experiment revealed something crucial: the baseline comparison is misleading. Real-world LLM applications don’t just extract data once — they use structured outputs for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Pydantic model validation (native SDK support)&lt;/li&gt;
&lt;li&gt; Tool/function calling (where output becomes input)&lt;/li&gt;
&lt;li&gt; Multi-turn agentic workflows (repeated serialization)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s where the math changes completely. Let me show you.&lt;/p&gt;


&lt;h2&gt;
  
  
  Experiment #3: Pydantic Models — Where the SDK Does the Heavy Lifting
&lt;/h2&gt;

&lt;p&gt;Here’s where things get interesting. Modern LLM SDKs have first-class support for structured outputs using Pydantic models. Instead of prompt engineering, you define a schema and let the SDK handle formatting.&lt;/p&gt;

&lt;p&gt;The key difference: You don’t need to explain the output format in your prompt — the SDK extracts it from your Pydantic model automatically.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Setup: Google’s GenAI SDK
&lt;/h3&gt;

&lt;p&gt;I used the same job extraction task, but this time with a Pydantic model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_mime_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;JobModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what’s missing: No output format instructions. No examples. No “Output as JSON with these exact keys.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Become a member&lt;/strong&gt;&lt;br&gt;
The SDK injects the schema behind the scenes.&lt;/p&gt;
&lt;h3&gt;
  
  
  Token Comparison: Pydantic JSON vs Manual TOON
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pydantic + JSON (SDK-Managed)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt tokens:&lt;/strong&gt; 647 ✅ (19.3% less than optimized TOON)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output tokens:&lt;/strong&gt; 389 ✅ (14.5% less than optimized TOON)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Success rate:&lt;/strong&gt; 100% ✅&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Parsing:&lt;/strong&gt; Native (SDK returns typed Python objects)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Manual TOON (From Experiment #2)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prompt tokens:&lt;/strong&gt; 802 ❌&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output tokens:&lt;/strong&gt; 455 ❌&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Success rate:&lt;/strong&gt; 100% ✅&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Parsing:&lt;/strong&gt; Custom (you write the parser)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  The Brutal Takeaway
&lt;/h3&gt;

&lt;p&gt;For structured extraction with strong SDK support, Pydantic really shines. Native Pydantic integration delivers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  ✅ Cleaner prompts (~155 fewer prompt tokens)&lt;/li&gt;
&lt;li&gt;  ✅ Smaller outputs (~66 fewer output tokens)&lt;/li&gt;
&lt;li&gt;  ✅ No custom parsing logic&lt;/li&gt;
&lt;li&gt;  ✅ Built-in type validation&lt;/li&gt;
&lt;li&gt;  ✅ Parsed objects returned directly, ready to use&lt;/li&gt;
&lt;li&gt;  ✅ A much smoother developer experience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of this, I’ll increasingly rely on Pydantic and native parsing support for structured extraction. It’s simply more reliable and maintainable than handling parsing and validation manually.&lt;/p&gt;

&lt;p&gt;That said, there’s one scenario where JSON’s verbosity becomes a genuine liability: tool calling in agentic workflows.&lt;br&gt;
That’s where TOON finally proves its worth.&lt;/p&gt;


&lt;h2&gt;
  
  
  Experiment #4: Tool Calling — Where TOON Finally Wins
&lt;/h2&gt;

&lt;p&gt;This is where everything clicked.&lt;/p&gt;

&lt;p&gt;In agentic workflows, your LLM doesn’t just extract data once — it calls tools, receives results, and uses those results to reason further. The tool’s response becomes part of the next prompt. And if that response is bloated with JSON syntax, you’re paying for it twice: once as output, once as input.&lt;/p&gt;

&lt;p&gt;The insight: Tool results are pure token waste. The model doesn’t need &lt;code&gt;{"key": "value"}&lt;/code&gt; ceremony—it needs the data, efficiently encoded.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Setup: Weather Agent with Function Calling
&lt;/h3&gt;

&lt;p&gt;I built a simple agent that calls a &lt;code&gt;get_current_weather&lt;/code&gt; function. The user asks for weather, the model calls the tool, the function returns data, and the model synthesizes a response.&lt;/p&gt;

&lt;p&gt;The critical moment: What format should &lt;code&gt;get_current_weather&lt;/code&gt; return?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version A: JSON Tool Response&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;72 F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sunny&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;forecast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;forecast&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Returns JSON string
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Version B: TOON Tool Response&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;72 F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sunny&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;forecast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;forecast&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Returns TOON-encoded string
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Main code&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the weather like in New York? Share next 15 days forecast as well.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_current_weather&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Result Token Usage:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Initial prompt tokens:&lt;/strong&gt; 152 (user message + tool definition)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Tool response tokens (becomes input):&lt;/strong&gt; 480 ✅ (24% reduction)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Model’s final output:&lt;/strong&gt; 384 (slightly longer, but reasonable)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Total tokens:&lt;/strong&gt; 1,016 ✅ (11.5% reduction overall)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why TOON Wins in Agentic Workflows
&lt;/h3&gt;

&lt;p&gt;Here’s the math that matters:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single Tool Call&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  JSON approach: 632 tokens for tool result&lt;/li&gt;
&lt;li&gt;  TOON approach: 480 tokens for tool result&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Savings: 152 tokens per tool call (24%)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Multi-Turn Agent (5 tool calls)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  JSON approach: 632 × 5 = 3,160 tokens in tool results&lt;/li&gt;
&lt;li&gt;  TOON approach: 480 × 5 = 2,400 tokens in tool results&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Savings: 760 tokens (24%)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Compounding Effect
&lt;/h3&gt;

&lt;p&gt;Why this matters more than single extractions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Tool results are pure input tokens&lt;/strong&gt; — You pay for them every single time&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Verbosity multiplies&lt;/strong&gt; — JSON’s &lt;code&gt;{}: ,&lt;/code&gt; Syntax adds 20-30% overhead for nested data&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;No parsing penalty&lt;/strong&gt; — The model consumes TOON just as easily (we verified this in follow-up tests)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Scales with agent complexity&lt;/strong&gt; — More tools = more savings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The difference? Where the efficiency matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;After this test runs across four different scenarios, here’s what the data tells us:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TOON loses at single extractions.&lt;/strong&gt; Whether you’re doing manual prompting or using Pydantic models, JSON with SDK support is cleaner, cheaper, and more reliable. The 17.6% token savings from native schema integration beats TOON’s manual approach every time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But TOON wins where it counts for agents: tool calling workflows.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When your LLM’s output becomes the next prompt — when data cycles between model and functions repeatedly — TOON’s 24% reduction per tool call transforms from interesting to impactful. An agent making 20 tool calls saves 3,040 tokens per session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The decision matrix is simple:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Building a chatbot that extracts structured data? &lt;strong&gt;Use JSON + Pydantic.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Building an agent that calls tools 10+ times per session? &lt;strong&gt;Test TOON.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Building anything else? &lt;strong&gt;Profile first, optimize later.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;I’ve open-sourced all the experiments, prompts, and token measurements: &lt;a href="https://gist.github.com/the-m-u-s-h-r-o-o-m/080c9e697843339946850d5353e9343c" rel="noopener noreferrer"&gt;View complete code and results on GitHub Gist&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The repository includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  ✅ All four experiment setups with actual prompts&lt;/li&gt;
&lt;li&gt;  ✅ Token usage logs for every test case&lt;/li&gt;
&lt;li&gt;  ✅ Side-by-side comparison scripts&lt;/li&gt;
&lt;li&gt;  ✅ The job descriptions I used for testing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;TOON isn’t magic — it’s math. And the math only works when token efficiency genuinely matters. For most applications, JSON’s ecosystem advantages outweigh the savings. But for token-heavy agentic workflows? TOON might just pay for itself.&lt;/p&gt;

&lt;p&gt;Now you have the data to decide.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>python</category>
      <category>gemini</category>
    </item>
  </channel>
</rss>
