<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Jahanzaib</title>
    <description>The latest articles on Forem by Jahanzaib (@jahanzaibai).</description>
    <link>https://forem.com/jahanzaibai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3860581%2F9503366d-3739-4d0f-98e3-56c0b5ed8466.jpeg</url>
      <title>Forem: Jahanzaib</title>
      <link>https://forem.com/jahanzaibai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/jahanzaibai"/>
    <language>en</language>
    <item>
      <title>How I Build Production Multi-Agent Systems With CrewAI Flows</title>
      <dc:creator>Jahanzaib</dc:creator>
      <pubDate>Mon, 06 Apr 2026 14:24:39 +0000</pubDate>
      <link>https://forem.com/jahanzaibai/how-i-build-production-multi-agent-systems-with-crewai-flows-1e25</link>
      <guid>https://forem.com/jahanzaibai/how-i-build-production-multi-agent-systems-with-crewai-flows-1e25</guid>
      <description>&lt;p&gt;Last year I rebuilt a client's entire content pipeline three times. The first version was a single LangGraph graph with 14 nodes, and every time we needed to add a new step, I spent two days re-threading state through the whole thing. The second version was a set of independent Python scripts that had no memory of each other. The third version, which is still running today, uses &lt;strong&gt;CrewAI Flows&lt;/strong&gt;. It took me four days to build what the LangGraph version took three weeks to produce, and my client went from reviewing every single output to running it fully autonomously after about 200 executions.&lt;/p&gt;

&lt;p&gt;CrewAI Flows is the production architecture for multi-agent systems that most tutorials skip past. Everyone shows you how to build a Crew. Almost nobody explains how to orchestrate multiple Crews together with state persistence, conditional routing, and gradual autonomy. This guide covers what I've learned across real deployments, not toy examples.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;CrewAI Flows wraps Crews in an event-driven orchestration layer that handles state, sequencing, and error recovery without the graph complexity of LangGraph&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The @start, @listen, and @router decorators are the three building blocks for almost every production workflow you'll need&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Structured state with Pydantic models is always worth the extra setup time because it makes debugging and persistence far simpler&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The gradual autonomy pattern (start at 100% human review, reduce as the system proves itself) is what separates successful production deployments from ones that get shut down after a week&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CrewAI Flows generates 14x less code than equivalent LangGraph graph implementations according to DocuSign's published case study&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;450 million agents run on CrewAI per month as of early 2026, with 60% of the US Fortune 500 using it in some form&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What CrewAI Flows Actually Are (and Why Crews Alone Aren't Enough)
&lt;/h2&gt;

&lt;p&gt;If you've used CrewAI before, you know what a Crew is: a team of agents, each with a role and tools, working through a set of tasks to produce an output. That's the intelligence layer. But a Crew is stateless by default. Run it today and run it tomorrow and it doesn't remember anything from the first run. It also has no concept of branching logic or error recovery at the orchestration level.&lt;/p&gt;

&lt;p&gt;Flows solve this. A Flow is a Python class that wraps your Crews and direct LLM calls inside an event-driven execution engine. You define methods, decorate them with &lt;code&gt;@start&lt;/code&gt;, &lt;code&gt;@listen&lt;/code&gt;, or &lt;code&gt;@router&lt;/code&gt;, and CrewAI handles the execution order, state threading, and persistence. Think of the Crew as the worker and the Flow as the manager who knows what order work gets done, what to do when something fails, and what happened last time.&lt;/p&gt;

&lt;p&gt;The distinction matters in practice. I've had clients who got halfway through building with Crews alone, hit the state management wall, and concluded that CrewAI wasn't production-ready. It is. They just stopped one layer short.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1519389950473-47ba0277781c%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1519389950473-47ba0277781c%3Fw%3D1200%26q%3D80" alt="Software team working on multi-agent system workflow architecture" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;Production agent systems need orchestration, not just execution.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four Building Blocks Before You Write Any Flow
&lt;/h2&gt;

&lt;p&gt;Before the Flow itself, you need to understand the four primitives that make it up. I've seen people skip this and spend days debugging things that become obvious once you understand what each piece owns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agents
&lt;/h3&gt;

&lt;p&gt;An Agent is an AI entity with a role, a goal, a backstory, and optionally a set of tools. The role and backstory aren't just documentation. They get injected into the system prompt and directly affect how the LLM behaves. I've found that more specific backstories produce more consistent outputs. "You are a financial analyst who has spent 15 years reading SEC filings" gets better results from GPT-4o than "You are a helpful AI assistant who analyzes financial data."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SerperDevTool&lt;/span&gt;

&lt;span class="n"&gt;researcher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Market Research Specialist&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find accurate, recent data about {topic} from reliable sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ve spent 12 years in market research, reading analyst reports
    and primary sources before writing a single sentence. You only cite data 
    you can trace back to a primary source.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;SerperDevTool&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;  &lt;span class="c1"&gt;# prevent runaway tool calls in production
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;max_iter&lt;/code&gt; parameter is something I always set in production. Without it, agents can get into tool-call loops that burn through tokens and take minutes. I set it between 3 and 5 for most tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tasks
&lt;/h3&gt;

&lt;p&gt;A Task assigns specific work to an agent with a description, expected output, and an optional output file or Pydantic model for structured output. The expected_output field matters more than most examples show. The more specific you are about what the output should look like, the fewer post-processing steps you need downstream.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ResearchOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;key_stats&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;

&lt;span class="n"&gt;research_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research the current state of {topic} and compile key findings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A structured research report with summary, 5-10 key statistics with sources, and confidence score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;researcher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_pydantic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ResearchOutput&lt;/span&gt;  &lt;span class="c1"&gt;# structured output for downstream tasks
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Crews
&lt;/h3&gt;

&lt;p&gt;A Crew ties agents and tasks together into an executable unit. The process parameter controls how tasks execute. &lt;code&gt;Process.sequential&lt;/code&gt; runs tasks one at a time in order. &lt;code&gt;Process.hierarchical&lt;/code&gt; adds a manager agent that delegates and reviews work. I use sequential for most workflows because it's predictable and debuggable. Hierarchical is useful when you genuinely want the model to reason about task assignment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Process&lt;/span&gt;

&lt;span class="n"&gt;research_crew&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;researcher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analyst&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;research_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analysis_task&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sequential&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# enables cross-run memory when combined with a Flow
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Flows
&lt;/h3&gt;

&lt;p&gt;The Flow class is where everything comes together. It's a Python class where methods are the workflow steps, decorators control execution order, and &lt;code&gt;self.state&lt;/code&gt; carries data between steps. The entire flow state gets a unique UUID automatically, which is essential for tracing and debugging in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Your First Production Flow
&lt;/h2&gt;

&lt;p&gt;Here's a real pattern I use for client content pipelines. The flow takes a topic, runs a research crew, evaluates whether the research is sufficient, and either proceeds to writing or loops back to deepen the research.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai.flow.flow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;router&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ContentFlowState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;research_quality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;  &lt;span class="c1"&gt;# "sufficient" or "insufficient"
&lt;/span&gt;    &lt;span class="n"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;final_article&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;iteration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ContentProductionFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ContentFlowState&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;

    &lt;span class="nd"&gt;@start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Entry point - set up the topic&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Starting content flow for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_research&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Run the research crew&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ResearchCrew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;research&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pydantic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="nd"&gt;@router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_research&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate_research&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;research_result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check if research quality is sufficient&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

        &lt;span class="c1"&gt;# Simple quality gate - check if we have enough sources
&lt;/span&gt;        &lt;span class="n"&gt;sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
        &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;research_quality&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sufficient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sufficient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Don't loop forever - move forward after 2 tries
&lt;/span&gt;            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;research_quality&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sufficient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sufficient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;research_quality&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;insufficient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;insufficient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sufficient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_article&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Write the article based on research&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WritingCrew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;draft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;insufficient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;deepen_research&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Research wasn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t good enough - try again with more focused query&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DeepResearchCrew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gaps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;need more primary sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;research&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pydantic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;write_article&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;edit_and_finalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Final edit pass&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;EditingCrew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;draft&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;requirements&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;factual, specific, no buzzwords&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final_article&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;

&lt;span class="c1"&gt;# Running the flow
&lt;/span&gt;&lt;span class="n"&gt;flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ContentProductionFlow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI agent deployment patterns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks like more code than a simple Crew, and it is. But this code handles four things a plain Crew doesn't: quality gates that can loop back, state that persists across all steps, clear entry and exit points, and a clean audit trail through &lt;code&gt;self.state&lt;/code&gt;. When something goes wrong in production, you know exactly where.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1542831371-29b0f74f9713%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1542831371-29b0f74f9713%3Fw%3D1200%26q%3D80" alt="Python code on a developer screen building multi-agent orchestration workflow" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;The decorator pattern in Flows makes execution order readable without graph diagrams.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  State Management: Why Pydantic Models Are Always Worth It
&lt;/h2&gt;

&lt;p&gt;CrewAI Flows support two state modes. Unstructured state uses a plain dictionary. Structured state uses a Pydantic &lt;code&gt;BaseModel&lt;/code&gt;. I've tried both approaches across multiple client projects, and I always end up migrating dictionary-based flows to Pydantic models eventually.&lt;/p&gt;

&lt;p&gt;The difference shows up in three places. First, type errors. With dictionary state, you can write &lt;code&gt;self.state['reserach']&lt;/code&gt; (typo) and your flow happily continues with a missing key until something downstream breaks in a confusing way. With Pydantic, that's a validation error at the start. Second, persistence. When you add the &lt;code&gt;@persist&lt;/code&gt; decorator for SQLite-backed state recovery, Pydantic models serialize cleanly. Nested dicts sometimes don't. Third, IDE support. I use VS Code and auto-complete on &lt;code&gt;self.state.research_quality&lt;/code&gt; saves real time.&lt;/p&gt;

&lt;p&gt;Here's how I structure state for a typical client workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Input
&lt;/span&gt;    &lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;input_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Progress tracking
&lt;/span&gt;    &lt;span class="n"&gt;stage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;init&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# init -&amp;gt; research -&amp;gt; processing -&amp;gt; review -&amp;gt; complete
&lt;/span&gt;    &lt;span class="n"&gt;started_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;iteration_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="c1"&gt;# Outputs per stage
&lt;/span&gt;    &lt;span class="n"&gt;research_output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;processed_output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;review_notes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;final_output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="c1"&gt;# Error handling
&lt;/span&gt;    &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;requires_human_review&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;stage&lt;/code&gt; field is useful for monitoring in production dashboards. The &lt;code&gt;errors&lt;/code&gt; list and &lt;code&gt;requires_human_review&lt;/code&gt; flag support the gradual autonomy pattern I cover below. Every time a step fails or produces low-confidence output, you append to &lt;code&gt;errors&lt;/code&gt; instead of raising an exception, and set &lt;code&gt;requires_human_review&lt;/code&gt; to true.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conditional Routing: Making Flows Actually Smart
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;@router&lt;/code&gt; decorator is where Flows go from linear sequences to intelligent pipelines. A router method returns a string, and that string routes execution to whichever &lt;code&gt;@listen&lt;/code&gt; method is registered for that string. This is how you implement approval gates, quality checks, and decision branches.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InvoiceProcessingFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;InvoiceState&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;

    &lt;span class="nd"&gt;@start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_invoice_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ExtractionCrew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invoice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_invoice&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extracted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pydantic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="nd"&gt;@router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extract_invoice_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_extraction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extracted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extracted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;low_confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# high-value invoices always need human review
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;standard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;standard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;auto_approve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;approval_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_human_review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto_approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;flag_for_human&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;approval_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending_human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_human_review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;queued_for_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;low_confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;re_extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Try a different extraction approach
&lt;/span&gt;        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FallbackExtractionCrew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invoice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw_invoice&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;use_ocr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extracted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pydantic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iteration_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern maps exactly to how a finance team already works. Low confidence extractions go to a different queue. High-value invoices always get a human. Standard cases process automatically. I built a version of this for a logistics client last quarter, and it got to 73% fully autonomous processing within the first month.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1633356122544-f134324a6cee%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1633356122544-f134324a6cee%3Fw%3D1200%26q%3D80" alt="Decision flow diagram representing conditional routing in AI agent systems" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;Conditional routing maps directly to how human workflows already branch on decisions.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Production Pattern Most Tutorials Skip: Gradual Autonomy
&lt;/h2&gt;

&lt;p&gt;Here's the thing that actually makes or breaks production AI deployments. It's not the framework. It's how you transition from human-supervised to autonomous operation.&lt;/p&gt;

&lt;p&gt;CrewAI published findings from 2 billion workflow executions, and the pattern that consistently produced the best outcomes was what they call gradual autonomy. You start with every output going through human review. You track the accuracy rate per output type. As you reach acceptable accuracy thresholds, you remove human review from those specific branches. You never flip a switch from 0% to 100% autonomous overnight.&lt;/p&gt;

&lt;p&gt;I build this directly into the Flow state. Every output has a &lt;code&gt;requires_human_review&lt;/code&gt; flag. The router checks a confidence threshold that starts low (everything gets reviewed) and raises it as the deployment proves itself. Here's the pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ProductionFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;

    &lt;span class="c1"&gt;# This threshold starts at 0.95 and decreases over time as 
&lt;/span&gt;    &lt;span class="c1"&gt;# you verify accuracy in your monitoring dashboard
&lt;/span&gt;    &lt;span class="n"&gt;AUTONOMY_THRESHOLD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUTONOMY_THRESHOLD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.95&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="nd"&gt;@router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process_output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_for_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;output_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;standard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Always review certain high-stakes outputs regardless of confidence
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;output_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;contract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;financial_report&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_communication&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AUTONOMY_THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Standard outputs at high confidence go autonomous
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AUTONOMY_THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto_process&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I expose &lt;code&gt;AUTONOMY_THRESHOLD&lt;/code&gt; as an environment variable so the client or their ops team can adjust it without code changes. When they're comfortable with the output quality, they lower the threshold. If they see a regression, they raise it. This gives non-technical stakeholders meaningful control over the system's autonomy level.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory: Making Each Run Smarter Than the Last
&lt;/h2&gt;

&lt;p&gt;CrewAI rebuilt its memory system in 2025, replacing separate short-term, long-term, and entity memory types with a unified Memory class. The important distinction for production use is the difference between &lt;em&gt;state&lt;/em&gt; (ephemeral within a run) and &lt;em&gt;memory&lt;/em&gt; (persists across runs).&lt;/p&gt;

&lt;p&gt;State carries data from step to step within one execution. Memory carries knowledge across executions. For most client workflows I build, both matter. The state tracks what happened in this run. The memory tracks what the system has learned across all runs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SalesOutreachFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;OutreachState&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;

    &lt;span class="nd"&gt;@start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;research_prospect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Recall what we know about this company from previous interactions
&lt;/span&gt;        &lt;span class="n"&gt;previous_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;company_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ProspectResearchCrew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;company_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;previous_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;previous_context&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No prior contact&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Save new research to memory for future runs
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remember&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;company_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prospect_research&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;research_prospect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;personalize_outreach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Previous email responses also inform this step via memory
&lt;/span&gt;        &lt;span class="n"&gt;response_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;responses:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;company_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PersonalizationCrew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prospect_research&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response_history&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No prior responses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prospect_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contact_name&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;personalized_email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The memory system uses an LLM to analyze content when saving, which means it infers context and categories without you having to manage a taxonomy manually. It also supports adaptive recall that blends semantic similarity with recency and importance scores. In practice this means the system gets better at retrieving relevant context over time without any manual tuning.&lt;/p&gt;

&lt;h2&gt;
  
  
  State Persistence: Surviving Crashes and Cold Starts
&lt;/h2&gt;

&lt;p&gt;One of my non-negotiables for production flows is state persistence. If a flow crashes mid-execution, you want to be able to resume from the last successful step, not restart from scratch. CrewAI supports this via the &lt;code&gt;@persist&lt;/code&gt; decorator and SQLite by default, with PostgreSQL available for multi-instance deployments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai.flow.persistence&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;persist&lt;/span&gt;

&lt;span class="nd"&gt;@persist&lt;/span&gt;  &lt;span class="c1"&gt;# adds SQLite-backed state recovery automatically
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ReportGenerationFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ReportState&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="c1"&gt;# For PostgreSQL in production:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai.flow.persistence.postgres&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PostgresPersistence&lt;/span&gt;

&lt;span class="nd"&gt;@persist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PostgresPersistence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;connection_string&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flow_states&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ReportGenerationFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ReportState&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With persistence enabled, each flow instance gets written to the database after every step completes. If the process restarts, you can resume from the last checkpoint by passing the flow's UUID. I store these UUIDs in my client's job queue system so resumption is automatic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1547394765-185e1e68f34e%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1547394765-185e1e68f34e%3Fw%3D1200%26q%3D80" alt="Server infrastructure and deployment environment for AI agent production systems" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;State persistence is what separates demo-quality agents from production-ready ones.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  CrewAI vs LangGraph: The Honest Comparison
&lt;/h2&gt;

&lt;p&gt;I've written elsewhere about &lt;a href="https://www.jahanzaib.ai/blog/langgraph-tutorial-build-production-ai-agents" rel="noopener noreferrer"&gt;building production AI agents with LangGraph&lt;/a&gt;, so I'll give you the unfiltered comparison rather than a sales pitch for either.&lt;/p&gt;

&lt;p&gt;LangGraph is the better choice when you have highly complex conditional logic with many parallel branches, need fine-grained control over individual state transitions, or are building systems where the graph structure itself is the core abstraction. Its checkpointing and streaming support are more mature. The developer experience for complex graphs is better because Cytoscape-style visualization helps debug complex topologies.&lt;/p&gt;

&lt;p&gt;CrewAI Flows is the better choice when your workflow maps naturally to roles and tasks, you want to move quickly from prototype to production, and you don't want to spend half your development time on framework boilerplate. DocuSign reported using 14x less code than their previous graph-based implementation when they switched. I've had similar experiences. The Flows decorator syntax is also more readable to non-engineers, which matters when you're explaining the system to a client or a product manager.&lt;/p&gt;

&lt;p&gt;The honest framing is this: most production enterprise workflows are not actually that complex at the graph level. They have 4-7 stages with 2-3 branching conditions. CrewAI Flows handles this elegantly. LangGraph shows its value when you're building something closer to a general-purpose agent runtime than a specific business workflow. If you're unsure which fits your use case, read my post on &lt;a href="https://www.jahanzaib.ai/blog/when-to-use-ai-agents-vs-automation" rel="noopener noreferrer"&gt;when to use AI agents vs automation&lt;/a&gt;. Sometimes neither is the right tool.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; CrewAI runs approximately 450 million agents per month and is used by 60% of the U.S. Fortune 500 as of early 2026. Their 2 billion workflow execution study found that teams starting with 100% human review and gradually reducing it consistently outperformed teams who deployed autonomously from day one. &lt;a href="https://blog.crewai.com/lessons-from-2-billion-agentic-workflows/" rel="noopener noreferrer"&gt;CrewAI Blog&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  A Real Deployment Pattern: The Lead Enrichment Pipeline
&lt;/h2&gt;

&lt;p&gt;Here's a condensed version of a lead enrichment pipeline I've deployed for a few B2B clients. It takes a company name and contact email, researches the company, enriches the contact record, scores the lead, and routes high-value leads to an immediate follow-up queue while queuing others for standard sequences.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LeadEnrichmentFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Flow&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;LeadState&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;

    &lt;span class="nd"&gt;@start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_lead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Basic validation before burning API credits
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;company_name&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contact_email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing required fields&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invalid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;valid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;valid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enrich_company&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CompanyResearchCrew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;company_name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;company_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pydantic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enrich_company&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enrich_contact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;company_result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ContactResearchCrew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contact_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;company_data&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contact_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pydantic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enrich_contact&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score_lead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contact_result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Score based on company and contact signals
&lt;/span&gt;        &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_calculate_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;company_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contact_data&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lead_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;

    &lt;span class="nd"&gt;@router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;score_lead&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_by_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hot_lead&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;warm_lead&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cold_lead&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hot_lead&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fast_track_outreach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Generate personalized email and add to priority queue
&lt;/span&gt;        &lt;span class="nc"&gt;PersonalizedOutreachCrew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lead_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_dict&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;warm_lead&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;standard_sequence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Add to standard nurture sequence
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nurture_queue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nd"&gt;@listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cold_lead&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;archive_lead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Store for future review but don't take immediate action
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;archived&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_calculate_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;company_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contact_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Your scoring logic here
&lt;/span&gt;        &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;company_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;employee_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;company_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;annual_revenue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5_000_000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;contact_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;is_decision_maker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;contact_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tech_stack_match&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs as a scheduled flow triggered by new CRM entries. It handles 200 to 500 leads per day with no human involvement for cold and warm leads. Hot leads get a human-reviewed email draft queued within minutes of the prospect entering the CRM. If you want to see what kinds of businesses benefit most from this type of automation, the &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI readiness assessment&lt;/a&gt; is a good starting point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Debugging and Monitoring in Production
&lt;/h2&gt;

&lt;p&gt;The debugging workflow I use for CrewAI Flows has three layers. First, local visualization. Call &lt;code&gt;flow.plot()&lt;/code&gt; before running anything in production and you get an interactive HTML diagram of the entire execution graph. This catches routing logic errors before they hit your API budget.&lt;/p&gt;

&lt;p&gt;Second, state logging. I add a simple decorator to every step method that logs the state to a structured JSON log. This gives me a complete audit trail of every flow execution without needing a dedicated observability tool.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@functools.wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flow_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;errors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flow_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stage&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Third, CrewAI Enterprise observability. If you're running high-volume production workloads, CrewAI's cloud platform gives you trace-level visibility into every agent decision, token usage per step, and per-crew latency breakdowns. I don't use it for every client, but for workflows handling significant volumes or high-value decisions, the visibility is worth the cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Mistakes and How to Avoid Them
&lt;/h2&gt;

&lt;p&gt;After deploying these systems for clients in ecommerce, B2B SaaS, and logistics, here are the mistakes I see most often and how to avoid them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not setting max_iter on agents.&lt;/strong&gt; An agent in a tool-call loop can run for five minutes and cost $15 before anything catches it. Set &lt;code&gt;max_iter=3&lt;/code&gt; unless you have a specific reason for more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using dict state instead of Pydantic models.&lt;/strong&gt; Dictionary state is faster to write and painful to maintain. You'll spend more time debugging key errors than the Pydantic setup saves you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Forgetting to handle the "invalid" branch.&lt;/strong&gt; Every router needs to handle the failure case. I've seen flows where the "invalid" listener was never defined, causing silent failures when validation checks returned "invalid".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not adding iteration limits to loop-back routes.&lt;/strong&gt; The &lt;code&gt;deepen_research&lt;/code&gt; example earlier shows this: always increment a counter and break out of the loop after a maximum number of retries. Infinite loops are a real risk with router-based recursion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blocking the thread with synchronous crew kickoffs.&lt;/strong&gt; If you're processing multiple leads or documents concurrently, use &lt;code&gt;kickoff_async()&lt;/code&gt; and asyncio to run flows in parallel. Synchronous execution is fine for single items but becomes a throughput bottleneck at any meaningful scale.&lt;/p&gt;

&lt;p&gt;If you're evaluating whether a system like this fits your business, I'd start with the &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI agent readiness assessment&lt;/a&gt; to understand where agent complexity is actually warranted versus where simpler automation would serve you better. And if you want to talk through a specific deployment, the &lt;a href="https://www.jahanzaib.ai/contact" rel="noopener noreferrer"&gt;contact page&lt;/a&gt; is the right starting point.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1542744173-8e7e53415bb0%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1542744173-8e7e53415bb0%3Fw%3D1200%26q%3D80" alt="Monitoring dashboard for production AI agent system performance metrics" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;Production deployments need monitoring, not just execution. Logging state at every step makes debugging tractable.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the difference between CrewAI Flows and CrewAI Crews?
&lt;/h3&gt;

&lt;p&gt;A Crew is a team of agents that executes a set of tasks. It's stateless by default and runs once to completion. A Flow wraps Crews inside an event-driven orchestration layer that handles state persistence, conditional routing, and multi-crew coordination. Think of the Crew as the worker and the Flow as the process manager.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I use CrewAI Flows instead of LangGraph?
&lt;/h3&gt;

&lt;p&gt;Use CrewAI Flows when your workflow maps naturally to roles and tasks, you want to move quickly, and your conditional logic isn't extremely complex. LangGraph is worth the additional setup when you need fine-grained control over individual state transitions, have many parallel branches, or are building something that functions more like a general-purpose agent runtime than a specific business process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does CrewAI Flows support human-in-the-loop?
&lt;/h3&gt;

&lt;p&gt;Yes. You can use the &lt;code&gt;@human_feedback&lt;/code&gt; decorator to pause flow execution and wait for human input before proceeding. The flow state persists while waiting, so you can implement approval workflows, quality review gates, and exception handling that involves a human decision.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does state persistence work in CrewAI Flows?
&lt;/h3&gt;

&lt;p&gt;Applying the &lt;code&gt;@persist&lt;/code&gt; decorator to your Flow class enables automatic state recovery. By default it uses SQLite, which is fine for single-instance deployments. For multi-instance production deployments, you can configure PostgreSQL persistence. State is written after each step completes, so a crash resumes from the last successful checkpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can CrewAI Flows run asynchronously?
&lt;/h3&gt;

&lt;p&gt;Yes. Use &lt;code&gt;kickoff_async()&lt;/code&gt; instead of &lt;code&gt;kickoff()&lt;/code&gt; to run flows asynchronously. This is important for high-throughput workloads where you need to process many items concurrently. You can use &lt;code&gt;asyncio.gather()&lt;/code&gt; to run multiple flow instances in parallel.&lt;/p&gt;

&lt;h3&gt;
  
  
  What LLM providers work with CrewAI?
&lt;/h3&gt;

&lt;p&gt;CrewAI supports all major providers through its litellm integration: OpenAI, Anthropic (Claude), Google Gemini, AWS Bedrock, Azure OpenAI, Groq, Ollama for local models, and more. You configure the model per agent using the &lt;code&gt;llm&lt;/code&gt; parameter or set a default at the Crew level.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I visualize a CrewAI Flow before running it?
&lt;/h3&gt;

&lt;p&gt;Call &lt;code&gt;flow.plot()&lt;/code&gt; on your flow instance to generate an interactive HTML diagram showing all steps, listeners, and routing logic. This is extremely useful for catching logical errors before they hit your API budget. The diagram updates automatically as you modify the flow code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is CrewAI suitable for enterprise production deployments?
&lt;/h3&gt;

&lt;p&gt;Yes. CrewAI is used by 60% of the US Fortune 500 and runs approximately 450 million agents per month as of early 2026. Their enterprise platform adds observability, access controls, deployment management, and dedicated support. That said, the open-source framework is production-capable on its own for most use cases when you follow the patterns around state persistence, error handling, and gradual autonomy.&lt;/p&gt;

</description>
      <category>crewai</category>
      <category>multiagent</category>
      <category>aiagents</category>
      <category>production</category>
    </item>
    <item>
      <title>OpenClaw's Security Crisis: What 346,000 Stars and 135,000 Exposed Instances Teach Us About AI Agent Security</title>
      <dc:creator>Jahanzaib</dc:creator>
      <pubDate>Mon, 06 Apr 2026 07:24:25 +0000</pubDate>
      <link>https://forem.com/jahanzaibai/openclaws-security-crisis-what-346000-stars-and-135000-exposed-instances-teach-us-about-ai-fpb</link>
      <guid>https://forem.com/jahanzaibai/openclaws-security-crisis-what-346000-stars-and-135000-exposed-instances-teach-us-about-ai-fpb</guid>
      <description>&lt;p&gt;Two weeks ago I got a message from a client asking whether OpenClaw was still safe to run. Their DevOps lead had seen the headlines about 135,000 exposed instances and nine CVEs published in four days, and they wanted to know if the system I helped them deploy was one of them. I ran a quick check, confirmed they were fine because we had set it up correctly from day one, and then spent the next hour reading every security advisory and CVE detail that had dropped in the past three months.&lt;/p&gt;

&lt;p&gt;OpenClaw, the open source AI agent from Peter Steinberger that hit 346,000 GitHub stars faster than any project in GitHub's history, is at the center of the first major AI agent security crisis of 2026. And the technical details are not abstract. They are specific, reproducible, and relevant to anyone running AI agents in production right now. Whether you use OpenClaw, NanoClaw, or any other autonomous agent framework, this story contains things you need to know before your next deployment.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;CVE-2026-25253 (CVSS 8.8) allows one-click remote code execution by exploiting OpenClaw's WebSocket origin validation gap. A victim visiting a single malicious webpage is enough to trigger full system compromise.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;135,000+ OpenClaw instances were found exposed on the public internet across 82 countries. More than 15,000 were directly vulnerable to remote execution.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Nine CVEs were disclosed in four days, including command injection, path traversal, and server-side request forgery flaws. Eight vulnerabilities were classified as critical.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;341 of 2,857 skills in the ClawHub marketplace were found to be malicious at time of audit. That is 12% of the entire plugin registry.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This is not an OpenClaw-specific problem. Any AI agent with persistent credentials, autonomous execution, and integrations into your digital life carries the same category of risk. The architecture itself is the attack surface.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I am still deploying OpenClaw for clients. The difference between a safe deployment and an exposed one is about four configuration choices, and I will walk through all of them.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Actually Happened With OpenClaw
&lt;/h2&gt;

&lt;p&gt;OpenClaw launched in November 2025 as an open source personal AI agent. Within 24 hours of going viral in January 2026 it had 20,000 GitHub stars. By early April it sits at 346,000, making it the fastest-growing open source project in GitHub history. That growth attracted something else too: security researchers who started looking very carefully at what the tool actually does.&lt;/p&gt;

&lt;p&gt;The first major finding was exposure. By the time CVE-2026-25253 was publicly disclosed on February 3, 2026, security researchers had already found over 135,000 OpenClaw instances running on publicly accessible IP addresses across 82 countries. More than 15,000 of those were directly exploitable via the RCE vulnerability. Most of the rest were accessible over unencrypted HTTP.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1639762681485-074b7f938ba0%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1639762681485-074b7f938ba0%3Fw%3D1200%26q%3D80" alt="Security monitoring dashboard showing network traffic and alert systems for AI agent infrastructure" width="1200" height="675"&gt;&lt;/a&gt;&lt;em&gt;Security researchers found over 135,000 OpenClaw instances exposed on the public internet — many running unencrypted over HTTP.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;CVE-2026-25253 is the one getting the most attention, and it deserves it. The vulnerability stems from a single design decision: OpenClaw's control UI reads a &lt;code&gt;gatewayUrl&lt;/code&gt; parameter from the query string without validating it, and auto-connects on page load. When it connects, it sends the stored gateway authentication token in the WebSocket payload. An attacker can host a malicious webpage, trick a user into visiting it, and receive that token within milliseconds. The WebSocket server does not validate the origin header, so any website can trigger this connection.&lt;/p&gt;

&lt;p&gt;Once an attacker has the gateway token, the blast radius is enormous. They can disable user confirmation prompts by setting &lt;code&gt;exec.approvals.set&lt;/code&gt; to &lt;code&gt;off&lt;/code&gt;. They can escape container restrictions by switching &lt;code&gt;tools.exec.host&lt;/code&gt; to &lt;code&gt;gateway&lt;/code&gt;. Then they have arbitrary code execution on the host machine. The entire attack chain runs in milliseconds according to Oasis Security's disclosure.&lt;/p&gt;

&lt;p&gt;OpenClaw patched this in version 2026.1.29, released January 30. But the issue was already in the wild and nine more CVEs followed over the next four days. These included command injection (CVE-2026-24763), SSRF in the gateway (CVE-2026-26322, CVSS 7.6), and path traversal in the browser upload component (CVE-2026-26329). In total, the initial audit turned up 512 vulnerabilities with eight classified as critical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Agents Are a Different Security Problem
&lt;/h2&gt;

&lt;p&gt;I have been building production AI systems for a few years now and have shipped 109 of them across industries. In that time, I have seen a lot of organizations treat AI agent security the same way they treat web application security. That framing misses something important.&lt;/p&gt;

&lt;p&gt;A web application has a defined interface. It accepts specific inputs, performs specific operations, and returns outputs within a bounded scope. Its attack surface is relatively static and auditable. An AI agent is different in a fundamental way: the instructions that control its behavior arrive at runtime, from untrusted sources, through the same channel as ordinary content.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1550751827-4bd374c3f58b%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1550751827-4bd374c3f58b%3Fw%3D1200%26q%3D80" alt="Cybersecurity terminal showing code execution and vulnerability scanning interface" width="1200" height="801"&gt;&lt;/a&gt;&lt;em&gt;OpenClaw's runtime can ingest untrusted text, download and execute skills from external sources, and perform actions using the credentials assigned to it — without equivalent controls to static application code.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;CrowdStrike published analysis of OpenClaw that put this clearly: "Indirect prompt injection attacks targeting OpenClaw have already been seen in the wild, such as an injection attempt to drain crypto wallets." The attack method involves embedding malicious instructions in data the agent ingests: emails, webpages, documents. The agent reads the content and the malicious instructions look identical to legitimate data from the model's perspective.&lt;/p&gt;

&lt;p&gt;This is a property of how language models process information. User data and control instructions occupy the same token space. There is no hardware-level separation between what the model is told to do and what it reads in the environment. This means prompt injection is not a bug you can patch once and forget. It is an architectural reality of the current generation of AI agents.&lt;/p&gt;

&lt;p&gt;OpenClaw specifically amplifies this risk because of its integration footprint. A single instance connects to WhatsApp, Telegram, Slack, Discord, and iMessage, while also managing email, calendars, files, and shell commands. CrowdStrike described this as "prompt injection transforming from a content manipulation issue into a full-scale breach enabler, where the blast radius extends to every system and tool the agent can reach."&lt;/p&gt;

&lt;p&gt;If you are evaluating whether your business is ready to deploy AI agents, the &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI readiness assessment&lt;/a&gt; on this site includes a technical readiness dimension specifically designed to surface these kinds of architectural concerns before you commit to a deployment path.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Supply Chain Problem Nobody Was Talking About
&lt;/h2&gt;

&lt;p&gt;The CVEs got the headlines, but the ClawHub marketplace finding might be the more systemic issue. Security researchers auditing the OpenClaw skill registry found 341 malicious skills out of 2,857 total. That is 12% of the entire plugin ecosystem.&lt;/p&gt;

&lt;p&gt;These are not obviously malicious tools. They appear as useful utilities: productivity helpers, calendar integrations, file management shortcuts. Once installed, a malicious skill runs with the same permissions as OpenClaw itself, which means it can read files, execute shell commands, exfiltrate credentials, and make outbound network requests. The user has no way to distinguish a legitimate skill from a compromised one without auditing the source code.&lt;/p&gt;

&lt;p&gt;Reco.ai's analysis of the marketplace found what they called "shadow AI with elevated privileges" — third-party code running inside an agent runtime that has persistent access to everything the agent can touch. This is a supply chain problem that mirrors what we saw with npm malware campaigns, except the blast radius per compromised package is considerably larger because the agent runtime has system-level access rather than just code-level access.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1526374965328-7f61d4dc18c5%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1526374965328-7f61d4dc18c5%3Fw%3D1200%26q%3D80" alt="Dark code terminal screen showing automated scripts and security vulnerability detection output" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;341 of 2,857 skills in the ClawHub marketplace were found to be malicious — roughly 12% of the entire plugin registry. Many appeared as ordinary productivity tools.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;OpenClaw's maintainers responded by accelerating their skill review process and removing the flagged entries. But the underlying dynamic remains: any marketplace-based extension model for an AI agent creates a continuous supply chain risk that requires ongoing vigilance, not just a one-time audit.&lt;/p&gt;

&lt;p&gt;For the clients I have deployed OpenClaw with, the rule is simple: no skills from the marketplace without code review. We use a curated allowlist of skills that I have reviewed manually or built in-house. It adds friction to the deployment process, and that friction is the point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Microsoft and CrowdStrike Weigh In
&lt;/h2&gt;

&lt;p&gt;Two weeks after the initial CVE disclosures, Microsoft published a security blog titled "Running OpenClaw safely: identity, isolation, and runtime risk." It is worth reading in full if you deploy AI agents at any scale. The core framework they articulate is three-layer: identity first, scope second, model last.&lt;/p&gt;

&lt;p&gt;Identity first means deciding who can talk to the agent before anything else. This includes implementing DM pairing, allowlists, and explicit open access controls. The reasoning is that if you do not control who can send instructions to the agent, you cannot control what the agent does.&lt;/p&gt;

&lt;p&gt;Scope second means deciding where the agent is allowed to act. This includes group allowlists, mention gating, tool restrictions, sandboxing, and device-level permissions. The principle of least privilege applies here exactly as it does in traditional infrastructure security: the agent should have access to exactly what it needs for the task at hand and nothing more.&lt;/p&gt;

&lt;p&gt;Model last is the most counterintuitive piece. Microsoft's guidance is to design your deployment under the assumption that the model can be manipulated. Not might be, can be. Build your system so that successful manipulation has a limited blast radius regardless of how clever the attack is. This means isolation, not trust, is the primary defense.&lt;/p&gt;

&lt;p&gt;CrowdStrike's analysis added an enterprise-specific dimension. They found a "growing number of internet-exposed OpenClaw instances, many accessible over unencrypted HTTP rather than HTTPS." Their recommendation for security teams: deploy Falcon Exposure Management to identify internal and external OpenClaw deployments, monitor DNS requests to openclaw.ai domains, and implement runtime guardrails to detect prompt injection attempts before execution.&lt;/p&gt;

&lt;p&gt;The Cisco security team published a post calling personal AI agents "a security nightmare" in enterprise contexts, pointing specifically to the risk when "employees deploy OpenClaw on corporate machines and connect it to enterprise systems without IT oversight." This is the shadow AI problem in its sharpest form: the productivity tool that arrived before the governance policy.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Actual Deployment Configuration
&lt;/h2&gt;

&lt;p&gt;I am not going to stop deploying OpenClaw because of this. The tool is genuinely useful for the right use cases and the vulnerabilities I described above are, with the right configuration, mitigable. Here is what I actually do when I set up OpenClaw for a client.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1516321318423-f06f85e504b3%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1516321318423-f06f85e504b3%3Fw%3D1200%26q%3D80" alt="Server infrastructure and network security hardware in data center environment for secure AI agent deployment" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;Secure AI agent deployment starts at the infrastructure layer: network isolation, dedicated credentials, and container boundaries before any configuration of the agent itself.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The first thing is network binding. OpenClaw's gateway defaults to &lt;code&gt;0.0.0.0:18789&lt;/code&gt;, which binds to all network interfaces including public ones. I always change this to &lt;code&gt;127.0.0.1&lt;/code&gt; on the first line of configuration. If remote access is required, it goes behind a VPN, never directly to the internet. This single change eliminates the primary exposure vector for CVE-2026-25253 and the mass exposure issue the researchers identified.&lt;/p&gt;

&lt;p&gt;Second is credential isolation. The agent gets its own dedicated accounts for every integration: a dedicated email account rather than a shared corporate inbox, a dedicated calendar, dedicated messaging credentials. These accounts have the minimum permissions required. When the agent makes a mistake or gets compromised, the blast radius is contained to those accounts rather than to an executive's email archive.&lt;/p&gt;

&lt;p&gt;Third is a containerized runtime. OpenClaw runs inside a Docker container with a non-root user, no privileged flags, restricted outbound network access (using a blocklist of ranges the agent has no legitimate reason to reach), and no host path mounts beyond what the specific use case requires. This is standard practice for any code running with elevated privileges.&lt;/p&gt;

&lt;p&gt;Fourth is the skill allowlist. No marketplace skills without review. If a client needs a specific integration, I either review the skill's source code in detail or build a minimal version in-house. The effort is worth it given that 12% of the ClawHub registry was compromised at peak.&lt;/p&gt;

&lt;p&gt;Fifth is OpenClaw's built-in audit command: &lt;code&gt;openclaw security audit&lt;/code&gt; and &lt;code&gt;openclaw security audit --fix&lt;/code&gt;. I run this after any configuration change and before any deployment that exposes the gateway to additional network surfaces. The command checks for gateway auth exposure, browser control exposure, overly permissive allowlists, and filesystem permission issues. It is not a complete security audit but it catches the most common misconfigurations quickly.&lt;/p&gt;

&lt;p&gt;If you are working through whether to adopt AI agents at all, whether to deploy OpenClaw versus a managed alternative, or whether your current setup has exposures you are not aware of, the &lt;a href="https://www.jahanzaib.ai/contact" rel="noopener noreferrer"&gt;contact page&lt;/a&gt; is the right starting point. This is exactly the kind of evaluation I do before any deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Businesses Adopting AI Agents
&lt;/h2&gt;

&lt;p&gt;The OpenClaw story is not really about OpenClaw. It is about what happens when autonomous systems with persistent credentials and broad integration access reach mainstream adoption before the security practices catch up. OpenClaw is the first case study because it grew faster than anything else, but the same dynamics apply to any AI agent deployment.&lt;/p&gt;

&lt;p&gt;I see this pattern with clients regularly. A team discovers that an AI agent can automate a meaningful chunk of their operational work. They deploy it quickly because the productivity gain is real and immediate. The security review happens later, if at all. And later means after the agent has already connected to production systems, processed sensitive data, and accumulated credentials that are hard to rotate without breaking workflows.&lt;/p&gt;

&lt;p&gt;The 77% of security professionals who told Fortune's survey they were comfortable with autonomous AI systems operating without human oversight have probably not done a detailed threat model for what "without human oversight" means when the agent can read all email, execute shell commands, and send messages on behalf of real people. Comfort without analysis is the vulnerability.&lt;/p&gt;

&lt;p&gt;There are three questions I ask every client before we touch OpenClaw or any other AI agent framework.&lt;/p&gt;

&lt;p&gt;First: what are the actual credentials this agent will hold, and what can someone do with them if they get compromised? Walk through the worst-case scenario for every integration before deployment. If the answer is "access to our entire customer database" or "the ability to send emails as our CEO," the deployment needs more isolation work before it goes live.&lt;/p&gt;

&lt;p&gt;Second: what external content will this agent ingest? If the agent reads emails, web pages, or third-party documents, it is consuming untrusted content through the same channel as its operating instructions. Every piece of external content is a potential prompt injection surface. This does not mean the agent cannot read external content. It means you need explicit sandboxing and output filtering between what the agent reads and what it can do.&lt;/p&gt;

&lt;p&gt;Third: what does the governance process look like when this agent misbehaves? Not if, when. At some point the agent will take an action you did not intend. The question is how fast you can detect it, how fast you can stop it, and how much damage it can do in the time between the error and the intervention. If the answer to any of those is "we do not know," that is the gap to close before deployment.&lt;/p&gt;

&lt;p&gt;I cover these evaluation dimensions in depth across some of my existing work: the &lt;a href="https://www.jahanzaib.ai/blog/what-is-openclaw-open-source-ai-agent-explained" rel="noopener noreferrer"&gt;OpenClaw overview post&lt;/a&gt; explains what the tool does, and the &lt;a href="https://www.jahanzaib.ai/blog/how-to-install-openclaw-complete-setup-guide" rel="noopener noreferrer"&gt;setup guide&lt;/a&gt; covers the installation process. But neither of those pieces goes deep on security hardening, which is why this post exists.&lt;/p&gt;

&lt;p&gt;If you want to understand whether your business is at a stage where AI agents make sense at all, versus simpler automation alternatives, the &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI readiness assessment&lt;/a&gt; takes about 10 minutes and gives you a scored breakdown across eight dimensions including technical readiness and data security posture. It is the starting point I recommend before any conversation about agent deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Broader Signal: AI Agent Security Is a Real Discipline Now
&lt;/h2&gt;

&lt;p&gt;One positive thing the OpenClaw crisis has done is accelerate the formalization of AI agent security as a distinct field. Microsoft published an enterprise security framework. CrowdStrike added OpenClaw-specific detection to Falcon. Kaspersky labeled current versions "unsafe for general use." The OWASP AI Security working group has been expanding its guidance specifically for agentic systems.&lt;/p&gt;

&lt;p&gt;This is good. It means the ecosystem is treating AI agent security with the seriousness it requires rather than treating agents as just another application type. The specific risks: prompt injection, supply chain compromise, credential amplification, autonomous execution without approval gates, are real and they require real tooling.&lt;/p&gt;

&lt;p&gt;The tools I use and recommend for securing agent deployments now include: Docker container isolation as baseline, Pangea for runtime authorization and audit logging, Prompt Security's ClawSec suite for OpenClaw specifically (the GitHub repo is at &lt;code&gt;prompt-security/clawsec&lt;/code&gt;), and Microsoft Defender for Cloud for enterprise deployments that need centralized visibility across multiple agent instances.&lt;/p&gt;

&lt;p&gt;What I do not use are "AI safety" tools that add prompts telling the model to "be safe" or "don't do anything harmful." Those are not security controls. They are suggestions. Security comes from architectural boundaries, not model instructions. You can combine both, but if you are relying on the latter without the former, your deployment is not secure regardless of how detailed the system prompt is.&lt;/p&gt;

&lt;p&gt;I have spent the last few months building agent security hardening into every deployment I do through &lt;a href="https://www.jahanzaib.ai/solutions" rel="noopener noreferrer"&gt;AgenticMode AI&lt;/a&gt;, specifically because the OpenClaw story made clear that this is not optional work anymore. The clients who came to me with existing OpenClaw setups that needed audit work all had variations of the same problem: the gateway was not locked down, the credentials were too broad, and the skill allowlist was off by default. Three configuration changes, none of them complex. But none of them happen automatically, which is why they did not happen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1555949963-aa79dcee981c%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1555949963-aa79dcee981c%3Fw%3D1200%26q%3D80" alt="Developer reviewing security code and configuration settings for AI agent hardening" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;AI agent security hardening comes down to four configuration choices: bind to localhost, isolate credentials, containerize the runtime, and allowlist skills manually.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The broader lesson is that AI agent adoption is moving faster than AI agent operations practice. The security infrastructure, governance frameworks, and deployment standards are being built while people are already running agents in production. That gap creates risk, and the OpenClaw crisis is the first major public demonstration of what that risk looks like when it materializes.&lt;/p&gt;

&lt;p&gt;The right response is not to avoid AI agents. For the right use cases, they deliver real and measurable operational leverage that simpler automation cannot match. I have documented this across the &lt;a href="https://www.jahanzaib.ai/work" rel="noopener noreferrer"&gt;case studies on this site&lt;/a&gt;, including production systems that handle thousands of operations per day without human intervention. The right response is to deploy them with the same engineering discipline you would apply to any system that holds credentials and executes actions on behalf of real people.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; OpenClaw reached 346K GitHub stars and 3.2 million users by April 2026 (&lt;a href="https://openclawvps.io/blog/openclaw-statistics" rel="noopener noreferrer"&gt;OpenClaw Statistics April 2026&lt;/a&gt;). CVE-2026-25253 disclosed at CVSS 8.8 with patch in v2026.1.29 (&lt;a href="https://thehackernews.com/2026/02/openclaw-bug-enables-one-click-remote.html" rel="noopener noreferrer"&gt;The Hacker News, Feb 2026&lt;/a&gt;). 135,000+ exposed instances across 82 countries, 15,000+ directly vulnerable (&lt;a href="https://pbxscience.com/openclaw-2026s-first-major-ai-agent-security-crisis-explained/" rel="noopener noreferrer"&gt;PBX Science, 2026&lt;/a&gt;). 341 of 2,857 marketplace skills found malicious (&lt;a href="https://www.crowdstrike.com/en-us/blog/what-security-teams-need-to-know-about-openclaw-ai-super-agent/" rel="noopener noreferrer"&gt;CrowdStrike, Feb 2026&lt;/a&gt;). Microsoft enterprise security framework for AI agents (&lt;a href="https://www.microsoft.com/en-us/security/blog/2026/02/19/running-openclaw-safely-identity-isolation-runtime-risk/" rel="noopener noreferrer"&gt;Microsoft Security Blog, Feb 2026&lt;/a&gt;). 77% of security professionals comfortable with autonomous AI without oversight (&lt;a href="https://fortune.com/2026/02/12/openclaw-ai-agents-security-risks-beware/" rel="noopener noreferrer"&gt;Fortune, Feb 2026&lt;/a&gt;).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Is OpenClaw safe to use in 2026?
&lt;/h3&gt;

&lt;p&gt;OpenClaw can be deployed safely with the right configuration. The key steps are binding the gateway to localhost rather than all interfaces, running the agent in an isolated container with a non-root user, using dedicated credentials with minimum permissions for each integration, and maintaining a manually reviewed skill allowlist rather than installing marketplace skills freely. OpenClaw's own built-in audit command catches the most common misconfigurations. Versions from v2026.1.29 onward include patches for the critical CVEs. The tool is not safe with default settings in any environment that has public internet exposure.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is CVE-2026-25253 and how serious is it?
&lt;/h3&gt;

&lt;p&gt;CVE-2026-25253 is a cross-site WebSocket hijacking vulnerability in OpenClaw with a CVSS score of 8.8. It allows a remote attacker to steal a user's gateway authentication token simply by getting them to visit a malicious webpage. The attack takes milliseconds. With the stolen token, the attacker can disable confirmation prompts, escape container restrictions, and execute arbitrary commands on the host machine. A patch was released in OpenClaw v2026.1.29 on January 30, 2026. If you are running an earlier version, update immediately.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is prompt injection and why does it matter for AI agents?
&lt;/h3&gt;

&lt;p&gt;Prompt injection is an attack where malicious instructions are embedded in content the AI agent reads: emails, web pages, documents, API responses. Because language models process instructions and data through the same token stream, the model cannot inherently distinguish between a legitimate instruction from you and a malicious instruction embedded in a webpage it reads. For an agent like OpenClaw that connects to email, messaging apps, and the web, any piece of external content is a potential injection surface. CrowdStrike has already observed prompt injection attacks against OpenClaw in the wild, including attempts to drain crypto wallets via injected instructions in web content.&lt;/p&gt;

&lt;h3&gt;
  
  
  How many OpenClaw instances were exposed to the internet?
&lt;/h3&gt;

&lt;p&gt;Security researchers found over 135,000 OpenClaw instances publicly accessible on the internet across 82 countries by the time CVE-2026-25253 was disclosed in February 2026. More than 15,000 of those instances were directly vulnerable to remote code execution. Most of the rest were accessible over unencrypted HTTP. The exposure happened because OpenClaw's gateway defaults to binding on all network interfaces (0.0.0.0) rather than localhost only, and most users did not change this default before connecting the agent to the internet.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the ClawHub marketplace risk?
&lt;/h3&gt;

&lt;p&gt;ClawHub is OpenClaw's skill marketplace, similar to an app store for the agent. A security audit found 341 malicious skills out of 2,857 total, representing roughly 12% of the entire registry at time of audit. These malicious skills run with the same permissions as OpenClaw itself, meaning they can read files, execute shell commands, and exfiltrate credentials. They are often disguised as productivity utilities. The recommendation for production deployments is to avoid the public marketplace entirely and use only manually reviewed skills or custom-built integrations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should businesses avoid AI agents because of the OpenClaw security issues?
&lt;/h3&gt;

&lt;p&gt;No. The OpenClaw security crisis is a lesson about deployment practices, not a reason to avoid AI agents entirely. AI agents deliver real operational value for the right use cases. The answer is to deploy them with proper security controls: network isolation, credential scoping, container-based runtime isolation, and supply chain controls for any third-party plugins. The same engineering discipline that applies to any system holding credentials and executing actions on behalf of real users applies here. Businesses that want an objective assessment of whether AI agents are the right fit for their current technical readiness should start with the &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI readiness assessment&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I check if my OpenClaw instance is vulnerable?
&lt;/h3&gt;

&lt;p&gt;Run &lt;code&gt;openclaw security audit&lt;/code&gt; from your OpenClaw installation. This built-in command checks for the most common vulnerabilities: gateway auth exposure, browser control exposure, overly permissive allowlists, and filesystem permission issues. If your gateway is binding to 0.0.0.0 instead of 127.0.0.1, that is the first thing to change. Also verify you are running v2026.1.29 or later, which includes patches for CVE-2026-25253 and related vulnerabilities. If your instance is accessible from the public internet, restrict that access immediately while you complete the rest of the hardening steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between OpenClaw and NanoClaw for security purposes?
&lt;/h3&gt;

&lt;p&gt;NanoClaw, built by Lazer and Gavriel Cohen, is a lightweight containerized alternative designed with isolation as a first-class concern from the start. It has a smaller integration footprint than OpenClaw and fewer surface areas for both CVE-style vulnerabilities and prompt injection. For use cases where you do not need OpenClaw's full 100+ built-in skill library, NanoClaw is often the more secure choice by default. OpenClaw has more skills and broader integrations but requires more intentional hardening to reach an equivalent security posture. For high-sensitivity deployments, NanoClaw's reduced attack surface is a meaningful advantage.&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>aisecurity</category>
      <category>aiagents</category>
      <category>cybersecurity</category>
    </item>
    <item>
      <title>Pydantic AI Tutorial: How I Build Type-Safe AI Agents That Actually Work in Production</title>
      <dc:creator>Jahanzaib</dc:creator>
      <pubDate>Mon, 06 Apr 2026 01:22:43 +0000</pubDate>
      <link>https://forem.com/jahanzaibai/pydantic-ai-tutorial-how-i-build-type-safe-ai-agents-that-actually-work-in-production-3bcp</link>
      <guid>https://forem.com/jahanzaibai/pydantic-ai-tutorial-how-i-build-type-safe-ai-agents-that-actually-work-in-production-3bcp</guid>
      <description>&lt;p&gt;The fourth time I had to debug a LangChain agent that silently returned malformed JSON and crashed a client's order processing pipeline, I decided I was done patching type errors at midnight. That was eight months ago. Since then I've built 14 production systems on Pydantic AI, and not one of them has broken in the same way.&lt;/p&gt;

&lt;p&gt;Pydantic AI is a Python agent framework built by the Pydantic team — the same people behind the library that OpenAI, Google, and Anthropic use for data validation inside their own SDKs. It launched in late 2024, hit 16,000 GitHub stars by early 2026, and releases new versions almost weekly. The core idea is simple: if you're going to build agents that run real business logic, they need the same type safety and validation guarantees you'd expect from any other production Python code.&lt;/p&gt;

&lt;p&gt;This isn't a beginner's hello-world guide. I'm going to walk you through the patterns I actually use across client deployments — structured outputs, dependency injection, async agents, Bedrock integration, and how to test all of it without burning through API credits. I'll also tell you exactly when I reach for LangGraph instead, because the two aren't competitors so much as complements.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Pydantic AI brings FastAPI-style type safety to AI agent development: your agent's output is a validated Pydantic model, not a string you hope parses correctly&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dependency injection lets you pass database connections, API clients, and user context into tools without global state or environment variable hacks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;@agent.tool&lt;/code&gt; decorator is all you need for function calling — Pydantic validates arguments automatically before your tool code even runs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Async support is first-class: &lt;code&gt;agent.run()&lt;/code&gt; is async and handles concurrent requests without blocking your event loop&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pydantic AI works natively with AWS Bedrock via &lt;code&gt;BedrockConverseModel&lt;/code&gt; — though structured streaming has a known limitation with Claude models (data arrives as one chunk)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use Pydantic AI when you need type-safe single agents or small agent graphs. Add LangGraph when you need complex conditional branching, checkpointing, or human-in-the-loop across many steps&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Is Pydantic AI and Why I Started Using It
&lt;/h2&gt;

&lt;p&gt;Most agent frameworks treat the LLM's output as a string you then parse. You prompt engineer your way to something that looks like JSON, then write a parser, then add error handling for when the JSON is broken, then add retry logic for when the retry produces equally broken JSON. I've done this. It's terrible.&lt;/p&gt;

&lt;p&gt;Pydantic AI flips this. You define what you want the agent to return — a Pydantic model, a list, a typed dict, even a primitive — and the framework handles the validation loop automatically. If the model returns something that doesn't match your schema, Pydantic AI sends the validation error back to the LLM as feedback and retries. You get a validated result or an exception. No silent failures.&lt;/p&gt;

&lt;p&gt;The second reason I use it is the developer experience. Because everything is typed, your IDE gives you autocomplete on &lt;code&gt;result.data&lt;/code&gt;, catches type errors before runtime, and makes refactoring safe. After spending too much time hunting down attribute access bugs in dynamically typed agent chains, this matters more than any benchmark number.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1555066931-bf19f8fd1085%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1555066931-bf19f8fd1085%3Fw%3D1200%26q%3D80" alt="Python code on a dark monitor screen showing type annotations and validation logic"&gt;&lt;/a&gt;&lt;em&gt;Type-annotated Python code is the foundation of Pydantic AI's safety model&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing Pydantic AI and Building Your First Agent
&lt;/h2&gt;

&lt;p&gt;Installation is a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pydantic-ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For AWS Bedrock specifically, you need the extras:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"pydantic-ai[bedrock]"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A minimal agent looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anthropic:claude-haiku-4-5-20251001&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;You are a concise assistant. Answer in 2 sentences max.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What is dependency injection?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Dependency injection is a pattern where an object receives its dependencies 
# from outside rather than creating them itself.
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The &lt;code&gt;result.data&lt;/code&gt; field holds the output. For a plain string output type (the default), it's just a string. But the real power comes when you tell the agent what shape you want back.&lt;/p&gt;

&lt;h2&gt;
  
  
  Structured Output: The Feature That Changes Everything
&lt;/h2&gt;

&lt;p&gt;Here's where Pydantic AI earns its name. Instead of parsing strings, you define a Pydantic model and pass it as &lt;code&gt;result_type&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CompetitorAnalysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;company_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;main_strength&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;main_weakness&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;pricing_tier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# 'budget', 'mid', 'enterprise'
&lt;/span&gt;    &lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anthropic:claude-haiku-4-5-20251001&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;result_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CompetitorAnalysis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Analyze the company described. Be specific and honest.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Analyze Zapier as a competitor to a custom n8n deployment for enterprise clients.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;  &lt;span class="c1"&gt;# type: CompetitorAnalysis
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pricing_tier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 'enterprise'
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;# Full string, validated
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I used this exact pattern for a B2B SaaS client building a competitive intelligence tool. Their previous implementation used GPT-4 with a JSON prompt and a custom parser. It worked about 80% of the time. With Pydantic AI it works 100% of the time or raises a clear exception you can handle explicitly.&lt;/p&gt;

&lt;p&gt;The validation loop is automatic. If Claude returns &lt;code&gt;pricing_tier: "mid-market"&lt;/code&gt; instead of one of your allowed values, Pydantic raises a &lt;code&gt;ValidationError&lt;/code&gt;, Pydantic AI sends that error message back to the LLM as a correction prompt, and the LLM tries again. You can configure &lt;code&gt;retries&lt;/code&gt; on the agent to control how many times this happens before raising to the caller.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complex Nested Models
&lt;/h3&gt;

&lt;p&gt;You're not limited to flat models. Nested structures work exactly as you'd expect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Action&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# 'human', 'agent', 'system'
&lt;/span&gt;    &lt;span class="n"&gt;estimated_minutes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ProjectPlan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;total_hours&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;risk_level&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# 'low', 'medium', 'high'
&lt;/span&gt;    &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Action&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;blockers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I've deployed this kind of multi-level output for project scoping agents where clients input a brief description of what they want to build and the agent returns a structured work breakdown. The type safety means the downstream code that reads &lt;code&gt;plan.actions&lt;/code&gt; never has to guess whether it's a list or a string.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dependency Injection: Production-Grade Context Passing
&lt;/h2&gt;

&lt;p&gt;This is the feature most tutorials gloss over, and it's the one that makes Pydantic AI actually usable in real systems. The problem it solves: your tools need context. They need database connections, API clients, the current user's ID, rate limiter instances. The wrong way to handle this is global variables or environment lookups inside tool functions. The right way is dependency injection.&lt;/p&gt;

&lt;p&gt;Pydantic AI's DI system uses a &lt;code&gt;Deps&lt;/code&gt; dataclass (any Python dataclass or TypedDict) that you pass into &lt;code&gt;agent.run()&lt;/code&gt;. Tools receive it via &lt;code&gt;RunContext&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Deps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;db_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt;  &lt;span class="c1"&gt;# your actual DB client
&lt;/span&gt;    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anthropic:claude-haiku-4-5-20251001&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;deps_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Deps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Help users look up their account information.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@agent.tool&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_account_balance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Deps&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;account_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get the account balance for a specific account type.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SELECT balance FROM accounts WHERE user_id = ? AND type = ?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;account_type&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;balance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;currency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;USD&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;deps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Deps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;usr_abc123&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;db_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your_db_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sk-...&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What is my checking account balance?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1558494949-ef010cbdcc31%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1558494949-ef010cbdcc31%3Fw%3D1200%26q%3D80" alt="Server infrastructure showing database connections and API integrations in a production environment"&gt;&lt;/a&gt;&lt;em&gt;Dependency injection keeps database connections and API clients out of global state&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The reason this matters in production: you can create &lt;code&gt;Deps&lt;/code&gt; from your request context. User ID from JWT. DB connection from your connection pool. Rate limiter for that specific user. The agent and its tools get exactly what they need with no global state, no threading issues, and no test pollution. In unit tests, you swap in mock clients without monkey patching anything.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real Client Example: CRM Enrichment Agent
&lt;/h3&gt;

&lt;p&gt;One of my real estate clients needed an agent that looks up a lead in their CRM, enriches it with publicly available property data, and writes a personalized follow-up draft. The &lt;code&gt;Deps&lt;/code&gt; object carries the CRM client, the property data API client, and the user's email for tone calibration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CRMDeps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;crm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CRMClient&lt;/span&gt;
    &lt;span class="n"&gt;property_api&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PropertyAPIClient&lt;/span&gt;
    &lt;span class="n"&gt;agent_email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="nd"&gt;@crm_agent.tool&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lookup_lead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;CRMDeps&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;lead_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Look up lead details from the CRM.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;lead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;crm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_lead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lead_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;full_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;property_interest&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;property_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;budget&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;budget_range&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;last_contact&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_contact_date&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@crm_agent.tool&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_market_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;CRMDeps&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;zip_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;property_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get current market data for a specific location and property type.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;property_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;market_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;zip_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;property_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;median_price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;median&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;days_on_market&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avg_dom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;inventory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;active_listings&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This agent runs on every new CRM lead. The output is a &lt;code&gt;FollowUpDraft&lt;/code&gt; Pydantic model with subject, body, and recommended_call_time. Zero global state, fully testable, and the type system means nobody on the team can accidentally pass the wrong client type.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Definition: Giving Your Agent Real Capabilities
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;@agent.tool&lt;/code&gt; decorator turns a regular Python function into an LLM-callable tool. Pydantic validates the arguments the LLM passes before your function code ever runs. This is huge — it means you don't need to write argument validation inside your tools.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@agent.tool&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Deps&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Search the internal knowledge base for relevant articles.

    Args:
        query: The search query string
        max_results: Maximum number of results to return (1-20)
        category: Optional category filter (&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;support&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;billing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;technical&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;search_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# still good to cap on our side
&lt;/span&gt;        &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;excerpt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;excerpt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The docstring is automatically extracted and sent to the LLM as the tool description. The parameter docstring descriptions become the JSON schema descriptions. This means your documentation and your tool contract are the same thing — change the docstring, the LLM's understanding changes too.&lt;/p&gt;

&lt;h3&gt;
  
  
  Error Handling Inside Tools
&lt;/h3&gt;

&lt;p&gt;When a tool raises an exception, Pydantic AI catches it and passes the error message back to the LLM as tool output. The agent can then try a different approach or explain the error to the user. You can also use &lt;code&gt;ModelRetry&lt;/code&gt; to signal the LLM should retry with different parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ModelRetry&lt;/span&gt;

&lt;span class="nd"&gt;@agent.tool&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Deps&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get current weather for a city.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;weather_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;temp_c&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;conditions&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conditions&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;CityNotFoundError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ModelRetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;City &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; not found. Try a more specific name or add the country code.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;RateLimitError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Don't retry on rate limits — surface the real error
&lt;/span&gt;        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Weather API rate limit reached. Please wait before requesting more data.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Async Agents: Handling Production Load
&lt;/h2&gt;

&lt;p&gt;In production, you're almost never running a single agent synchronously. You're handling concurrent user requests, batch processing, or parallel sub-agent calls. Pydantic AI is async-first — &lt;code&gt;agent.run()&lt;/code&gt; returns a coroutine, and &lt;code&gt;agent.run_sync()&lt;/code&gt; is just a thin wrapper for scripts and REPL use.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_leads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lead_ids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CRMDeps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;FollowUpDraft&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process multiple leads concurrently.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;crm_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Generate a follow-up for lead &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lead_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;lead_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;lead_ids&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_exceptions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;drafts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;lead_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lead_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Lead &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lead_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;drafts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;drafts&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a logistics client I work with, this pattern processes 40 to 60 inbound shipment notifications concurrently. Each agent call checks carrier APIs, validates delivery windows, and generates exception reports. The whole batch runs in under 8 seconds because the I/O waits overlap instead of stacking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Message History and Multi-Turn Conversations
&lt;/h3&gt;

&lt;p&gt;If you need conversational context across turns — a support chat, an interview flow, a wizard-style form — you pass the previous run's messages into the next call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Deps&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;message_history&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Extend message history with this turn
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all_messages&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;result.all_messages()&lt;/code&gt; call returns the full conversation including tool calls and results, formatted correctly for the next run. No manual message formatting needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pydantic AI with AWS Bedrock
&lt;/h2&gt;

&lt;p&gt;Most tutorials use OpenAI or Google Gemini. My production deployments almost all run on AWS Bedrock because my clients are already in AWS and the spend goes against existing Enterprise Discount Program commitments. Pydantic AI's Bedrock support works well, with one important gotcha.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1451187580459-43490279c0fa%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1451187580459-43490279c0fa%3Fw%3D1200%26q%3D80" alt="AWS cloud computing infrastructure showing distributed system architecture"&gt;&lt;/a&gt;&lt;em&gt;AWS Bedrock handles IAM, region routing, and enterprise compliance automatically&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up BedrockConverseModel
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.models.bedrock&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockConverseModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.providers.bedrock&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockProvider&lt;/span&gt;

&lt;span class="c1"&gt;# Option 1: String shorthand (uses default credentials)
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrock:anthropic.claude-haiku-4-5-20251001&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Option 2: Explicit model with region (my standard setup)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockConverseModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-haiku-4-5-20251001&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;BedrockProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Credentials come from the standard boto3 chain: environment variables, instance profile, assumed role. In Lambda or ECS, this just works. Locally you need &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt;, &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt;, and &lt;code&gt;AWS_REGION&lt;/code&gt; in your environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Streaming Gotcha
&lt;/h3&gt;

&lt;p&gt;Structured streaming (getting partial validated output as the model generates) does not work with Claude models via Bedrock. When you use &lt;code&gt;agent.run_stream()&lt;/code&gt; with a Bedrock Claude model and a structured result type, the data still arrives as a single chunk at the end rather than progressively. For text output (no &lt;code&gt;result_type&lt;/code&gt;), streaming works fine.&lt;/p&gt;

&lt;p&gt;In practice this hasn't blocked any of my deployments. Structured output tasks are usually fast enough that streaming the structure itself isn't useful — you want the complete validated object, not partial JSON. For cases where I need real-time feedback, I separate the streaming UI from the structured processing step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross-Region Inference
&lt;/h3&gt;

&lt;p&gt;If you want to use cross-region inference profiles (for higher throughput limits), just use the inference profile ARN as the model name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockConverseModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us.anthropic.claude-haiku-4-5-20251001&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# US cross-region profile
&lt;/span&gt;    &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;BedrockProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Pydantic AI vs LangGraph: When to Use Which
&lt;/h2&gt;

&lt;p&gt;I use both. The choice isn't about which is better — it's about what your workflow actually needs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Need&lt;/th&gt;
&lt;th&gt;Use Pydantic AI&lt;/th&gt;
&lt;th&gt;Use LangGraph&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single agent with tools&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Overkill&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structured validated output&lt;/td&gt;
&lt;td&gt;Yes, native&lt;/td&gt;
&lt;td&gt;Needs extra wiring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependency injection&lt;/td&gt;
&lt;td&gt;First-class&lt;/td&gt;
&lt;td&gt;Via LangChain context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex branching logic&lt;/td&gt;
&lt;td&gt;Gets messy&lt;/td&gt;
&lt;td&gt;Yes, this is its strength&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Checkpoint and resume&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (core feature)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human-in-the-loop approval&lt;/td&gt;
&lt;td&gt;Basic support&lt;/td&gt;
&lt;td&gt;Robust &lt;code&gt;interrupt_before&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-agent orchestration at scale&lt;/td&gt;
&lt;td&gt;Use as node inside LangGraph&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Type safety throughout&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Learning curve&lt;/td&gt;
&lt;td&gt;Low (like FastAPI)&lt;/td&gt;
&lt;td&gt;Medium (graph concepts)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern I reach for most: use Pydantic AI agents as the "worker" nodes inside a LangGraph graph. LangGraph handles the orchestration, routing, and state persistence. Each node calls a Pydantic AI agent with typed inputs and outputs. You get the best of both: LangGraph's workflow control with Pydantic AI's type safety at the task level.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Pydantic AI Agents Without Burning API Credits
&lt;/h2&gt;

&lt;p&gt;This section doesn't exist in most tutorials. Testing agents is different from testing regular functions because the LLM response is non-deterministic. Pydantic AI has a built-in solution: &lt;code&gt;FunctionModel&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;FunctionModel&lt;/code&gt; lets you replace the real LLM with a function that returns whatever you want. Your tests run instantly, cost nothing, and are deterministic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.models.function&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FunctionModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ModelContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ModelResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TextPart&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolCallPart&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mock_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ModelContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ModelResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Return a mock response that calls the search tool.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Simulate the LLM deciding to call a tool
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ModelResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;ToolCallPart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;search_knowledge_base&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;refund policy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;max_results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_support_agent_calls_search&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;test_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;support_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;override&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;FunctionModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mock_model&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;deps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SupportDeps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;search_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;MockSearchClient&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;test_user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;test_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;What is your refund policy?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Verify the tool was called and output was validated
&lt;/span&gt;    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SupportResponse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504639725590-34d0984388bd%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504639725590-34d0984388bd%3Fw%3D1200%26q%3D80" alt="Code testing and validation pipeline running automated tests on AI agent logic"&gt;&lt;/a&gt;&lt;em&gt;FunctionModel makes agent testing deterministic and free — no API calls required&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For integration tests where you want to hit the real model but verify behavior at a higher level, Pydantic AI's built-in evaluation tools let you run test cases against your agent and check outputs against assertions. The &lt;a href="https://ai.pydantic.dev/" rel="noopener noreferrer"&gt;official docs&lt;/a&gt; have examples of this under "Evals".&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Optimization in Production
&lt;/h2&gt;

&lt;p&gt;Three things eat your token budget with Pydantic AI agents:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Retry loops.&lt;/strong&gt; Every validation failure triggers a retry. If your result type is too strict or your prompt is ambiguous, you can end up paying for 3 to 5 model calls per request. Track your &lt;code&gt;result.usage()&lt;/code&gt; across a sample of real calls. Anything averaging over 1.2 calls is a warning sign that your schema or prompt needs work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Requests: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Input tokens: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Tool descriptions.&lt;/strong&gt; The docstring of every tool goes into every system prompt. If you have 12 tools each with 200-word docstrings, you're paying for 2,400 tokens of tool descriptions on every single call. Be ruthless: keep docstrings under 50 words and use the parameter descriptions only for non-obvious fields.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Message history growth.&lt;/strong&gt; If you're passing multi-turn history, tokens grow linearly with conversation length. For most business workflows, conversations beyond 10 turns are rare. Add a hard limit or summarization step at the 8-turn mark.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Real Client Deployments
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Legal Document Classifier (Legal Tech Client)
&lt;/h3&gt;

&lt;p&gt;A legal tech startup needed to classify inbound contract documents by type, jurisdiction, and risk flags before routing to the right attorney. Previous approach: keyword matching with 200 rules. Accuracy: 67%. My implementation: a Pydantic AI agent with a &lt;code&gt;DocumentClassification&lt;/code&gt; result type (type enum, jurisdiction string, risk_flags list, confidence float). Running on Bedrock Claude Haiku 4.5. Accuracy: 94%. Processing time: under 2 seconds per document.&lt;/p&gt;

&lt;p&gt;The key was the confidence field. When the agent returns &lt;code&gt;confidence &amp;lt; 0.8&lt;/code&gt;, the document goes to a human review queue instead of auto-routing. Before Pydantic AI, getting a reliable confidence score out of an LLM took 3 layers of prompt engineering. With structured output it's just a float field in the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. E-Commerce Customer Support Triage (DTC Brand)
&lt;/h3&gt;

&lt;p&gt;An e-commerce client had 800 to 1,200 support tickets per day. They wanted to auto-resolve the 40% of tickets that were standard order status inquiries. I built a Pydantic AI agent with tools for order lookup, shipping carrier API calls, and CRM history. The &lt;code&gt;TriageResult&lt;/code&gt; model includes action (auto-resolve, escalate, or needs-info), response_draft, confidence, and escalation_reason.&lt;/p&gt;

&lt;p&gt;The dependency injection pattern meant the agent gets the customer's order history and past tickets injected from the request context. No separate retrieval step. The agent resolves 38% of tickets automatically (slightly under target due to some edge cases) with a 96% customer satisfaction rate on auto-resolved tickets. Their support team handles 750 fewer tickets per day.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. CRM Lead Scoring (Real Estate Agency)
&lt;/h3&gt;

&lt;p&gt;A real estate agency wanted AI-powered lead scoring that integrates with their custom CRM. The agent takes a lead profile, calls property interest lookup and local market data tools, and returns a &lt;code&gt;LeadScore&lt;/code&gt; object with a numeric score (0 to 100), a tier (hot, warm, cold), a one-paragraph reasoning, and a list of recommended next actions. The scoring runs automatically on every new lead and on weekly rescores of the existing pipeline. The injection of the agent's own contact info into &lt;code&gt;Deps&lt;/code&gt; means the same agent code generates recommendations personalized to different agents on their team.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Mistakes I See
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Using run_sync() everywhere.&lt;/strong&gt; It's fine for scripts. In a FastAPI app or Lambda handler, you want &lt;code&gt;await agent.run()&lt;/code&gt;. The sync wrapper blocks your event loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Putting business logic inside result validators.&lt;/strong&gt; Pydantic validators in your result type run on every retry. If a validator makes a database call, it runs 3 times on a failed validation. Put database calls in tools, not validators.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Over-specifying the system prompt.&lt;/strong&gt; LLMs are good at inference. You don't need to explain JSON format, tell the model not to apologize, or add 500 words of rules. Your result_type specification already constrains the output format. Trust the validation loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not setting a retries limit.&lt;/strong&gt; The default retry count is generous. In production, set &lt;code&gt;retries=2&lt;/code&gt; on your agent and handle &lt;code&gt;UnexpectedModelBehavior&lt;/code&gt; explicitly instead of letting the framework burn tokens on an agent that consistently can't satisfy your schema.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; Pydantic AI hit 16,000 GitHub stars by April 2026 with the parent Pydantic library surpassing 10 billion downloads across all Python projects. According to the &lt;a href="https://github.com/pydantic/pydantic-ai" rel="noopener noreferrer"&gt;Pydantic AI GitHub repository&lt;/a&gt;, the latest release came on April 3, 2026. Amazon Web Services supports Pydantic AI in its &lt;a href="https://aws.amazon.com/blogs/machine-learning/build-reliable-ai-agents-with-amazon-bedrock-agentcore-evaluations/" rel="noopener noreferrer"&gt;Bedrock AgentCore documentation&lt;/a&gt;. For the parent Pydantic download milestone, see &lt;a href="https://pydantic.dev/articles/pydantic-validation-10-billion-downloads" rel="noopener noreferrer"&gt;Pydantic's official blog post&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Pydantic AI production-ready in 2026?
&lt;/h3&gt;

&lt;p&gt;Yes. The framework has been in active production use since early 2025, reached its 1.x stable API in late 2025, and as of April 2026 is used by companies including those building on Amazon Bedrock AgentCore. The weekly release cadence means bugs get fixed fast, but the stable API means your code doesn't break between updates. I've been running it in production for 8 months without a breaking change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Pydantic AI work with AWS Bedrock?
&lt;/h3&gt;

&lt;p&gt;Yes, via the &lt;code&gt;BedrockConverseModel&lt;/code&gt; class. Install with &lt;code&gt;pip install "pydantic-ai[bedrock]"&lt;/code&gt;, then initialize your agent with &lt;code&gt;'bedrock:anthropic.claude-haiku-4-5-20251001'&lt;/code&gt; or an explicit &lt;code&gt;BedrockConverseModel&lt;/code&gt; instance. Credentials come from the standard boto3 chain. One caveat: structured output streaming does not work with Claude models on Bedrock — data arrives as a single chunk rather than progressively.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between Pydantic AI and LangChain?
&lt;/h3&gt;

&lt;p&gt;LangChain is a broad ecosystem covering everything from prompt templates to vector store integrations to agent frameworks. Pydantic AI is narrowly focused on one thing: type-safe agents with validated outputs. Pydantic AI has less surface area, a cleaner API, and first-class type checking. LangChain has more integrations and a larger community. For new projects I start with Pydantic AI and add LangChain integrations only when I need something specific it provides.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Pydantic AI handle tool errors?
&lt;/h3&gt;

&lt;p&gt;Tool exceptions are caught automatically and passed back to the LLM as tool output, giving the model a chance to recover or try a different approach. You can also raise &lt;code&gt;ModelRetry&lt;/code&gt; from inside a tool to explicitly signal the LLM should try different parameters. For errors you don't want the LLM to retry — like rate limit errors — raise a standard exception and it bubbles up to your caller.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Pydantic AI agents call other Pydantic AI agents?
&lt;/h3&gt;

&lt;p&gt;Yes. You can call an agent from inside another agent's tool function, or use Pydantic AI agents as node functions inside a LangGraph graph. The nested agent gets its own dependency context. This pattern works well for orchestrator-worker setups where a top-level agent decides what sub-task to delegate and sub-agents handle the specialized work.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I test Pydantic AI agents without making real API calls?
&lt;/h3&gt;

&lt;p&gt;Use &lt;code&gt;FunctionModel&lt;/code&gt; from &lt;code&gt;pydantic_ai.models.function&lt;/code&gt;. It lets you replace the LLM with a Python function that returns a &lt;code&gt;ModelResponse&lt;/code&gt;. Your tests run instantly, are deterministic, and cost nothing. For tool-specific tests, mock the dependency objects in your &lt;code&gt;Deps&lt;/code&gt; dataclass. The official docs also include an eval framework for higher-level behavioral testing against real models.&lt;/p&gt;

&lt;h3&gt;
  
  
  What models does Pydantic AI support?
&lt;/h3&gt;

&lt;p&gt;Pydantic AI supports 20+ providers including OpenAI, Anthropic (direct and via Bedrock), Google Gemini, Cohere, Mistral, Groq, and others. You can also implement a custom model class for any provider with an HTTP API. The model-agnostic design means you can switch providers in one line without changing any agent or tool code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Pydantic AI free to use?
&lt;/h3&gt;

&lt;p&gt;Yes. Pydantic AI is open source (MIT license) and free to use. You pay only for the underlying LLM API calls — Bedrock, OpenAI, Anthropic, or whatever provider you configure. There is no hosted version or subscription fee from the Pydantic team. The optional Pydantic Logfire integration for observability has a free tier and paid plans.&lt;/p&gt;

&lt;p&gt;If you want to see how I decide between Pydantic AI and full agent orchestration with state management, read my &lt;a href="https://www.jahanzaib.ai/blog/langgraph-tutorial-build-production-ai-agents" rel="noopener noreferrer"&gt;LangGraph tutorial&lt;/a&gt;. For understanding when you even need agents versus simpler automation, the &lt;a href="https://www.jahanzaib.ai/blog/when-to-use-ai-agents-vs-automation" rel="noopener noreferrer"&gt;agents vs automation guide&lt;/a&gt; covers my full decision framework. If you are ready to build something and want a second opinion on your architecture, the &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI readiness assessment&lt;/a&gt; is a good starting point, or just &lt;a href="https://www.jahanzaib.ai/contact" rel="noopener noreferrer"&gt;reach out directly&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>pydanticai</category>
      <category>python</category>
      <category>aiagents</category>
      <category>awsbedrock</category>
    </item>
    <item>
      <title>Most Clients Come to Me Wanting AI Agents. Most Leave With Zapier Instead.</title>
      <dc:creator>Jahanzaib</dc:creator>
      <pubDate>Sun, 05 Apr 2026 13:54:32 +0000</pubDate>
      <link>https://forem.com/jahanzaibai/most-clients-come-to-me-wanting-ai-agents-most-leave-with-zapier-instead-3kji</link>
      <guid>https://forem.com/jahanzaibai/most-clients-come-to-me-wanting-ai-agents-most-leave-with-zapier-instead-3kji</guid>
      <description>&lt;p&gt;I build AI agents for a living. Custom multistep orchestration systems, retrieval pipelines, tool calling architectures. The whole thing. I run &lt;a href="https://www.jahanzaib.ai/services" rel="noopener noreferrer"&gt;AgenticMode&lt;/a&gt; and spend most of my working hours designing systems that automate complex business decisions using large language models.&lt;/p&gt;

&lt;p&gt;And I spend a significant portion of those same hours talking clients out of building AI agents.&lt;/p&gt;

&lt;p&gt;Not because agents are not powerful. They are. But the question "should I build an AI agent for this?" is almost never the right starting question. The right question is: does this task require reasoning, or does it just require rules? Those two things look the same from the outside. The cost difference between getting the answer wrong is enormous.&lt;/p&gt;

&lt;p&gt;Gartner put a number on it in June 2025: &lt;a href="https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027" rel="noopener noreferrer"&gt;over 40% of agentic AI projects will be canceled by end of 2027&lt;/a&gt; due to escalating costs, unclear business value, or inadequate risk controls. That is not a fringe prediction. That is Gartner's base case. And in my experience building these systems, the number feels conservative.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Most business processes are more deterministic than they appear, and deterministic tasks belong in Zapier, Make, or n8n not AI agents&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MIT's 2025 NANDA study found 95% of generative AI pilots fail to deliver measurable P&amp;amp;L impact&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A five question framework tells you quickly whether your use case genuinely needs agentic AI or just better automation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AI agents earn their cost in exactly three scenarios: unstructured data at the core, multistep reasoning with feedback loops, and dynamic tool selection at runtime&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The right path is almost always: automate first, add AI precisely where the automation breaks down&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Most real production systems are hybrid: automation handles the predictable 70% to 80%, agents handle the rest&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Expensive Mistake Everyone Is Making
&lt;/h2&gt;

&lt;p&gt;Here is how the overengineering pattern plays out. A founder reads about AI agents. They see a demo. They feel behind. They commission someone to build an agent for their order processing, customer support, or lead qualification pipeline. The agent gets built. It works in demos. Then it hits production and starts doing things nobody expected: hallucinating context, failing silently, costing $8 in API calls for tasks that used to cost $0.003 in a Zapier workflow.&lt;/p&gt;

&lt;p&gt;MIT's NANDA Initiative published a study in August 2025 based on 150 executive interviews, 350 employee surveys, and 300 AI deployments. Their headline finding: &lt;a href="https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/" rel="noopener noreferrer"&gt;95% of generative AI pilots fail to deliver measurable returns on the P&amp;amp;L&lt;/a&gt;. More than half of corporate AI budgets were directed at sales and marketing use cases, despite the strongest returns consistently coming from back office process automation.&lt;/p&gt;

&lt;p&gt;I have seen this pattern in almost every industry. An e-commerce company spending $4,000 per month routing product questions through an LLM when a decision tree in n8n would have handled 80% of them at a fraction of the cost. A SaaS company building an agent to qualify leads when the qualification criteria were already well defined and a Make workflow would have worked fine. A healthcare practice building an "intelligent scheduling agent" for appointment types with exactly three decision variables, which is a job for a rules engine, not a language model.&lt;/p&gt;

&lt;p&gt;The problem is not the technology. The problem is a fundamental mismatch between what businesses need and what they think they need.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The sharp drop from exploring to real impact is where most AI agent projects are lost.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5ls50akxy1bd3xrgu98.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5ls50akxy1bd3xrgu98.png" alt="Zapier workflow automation platform homepage" width="800" height="420"&gt;&lt;/a&gt;&lt;em&gt;Zapier workflow automation platform homepage&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What an AI Agent Actually Is (and Is Not)
&lt;/h2&gt;

&lt;p&gt;Before you can make a good decision about whether you need one, you need a clear technical definition, not the marketing version.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;workflow automation tool&lt;/strong&gt; like Zapier, Make, or n8n executes a predefined sequence of steps. It moves data from A to B when a trigger fires. It does not decide what to do next based on context. It cannot handle inputs it wasn't programmed for. It fails predictably and loudly when something goes wrong. That is a feature, not a limitation.&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;AI agent&lt;/strong&gt; uses a language model to make decisions during execution. It can reason about ambiguous inputs, choose between tools dynamically, handle edge cases that weren't anticipated, and chain multiple steps in response to changing context. It is also unpredictable, meaning it can produce different outputs for the same input depending on factors you don't control.&lt;/p&gt;

&lt;p&gt;That unpredictability is the part that gets people into trouble. In a Zapier flow, if step 3 fails, you know exactly what happened and why. In a multistep agent chain, a hallucinated intermediate result propagates forward and compounds with every subsequent step. By the time the error surfaces, tracing it back to the source is a serious debugging investment.&lt;/p&gt;

&lt;p&gt;Gartner's June 2025 report noted that only around 130 of the thousands of companies calling themselves "agentic AI" vendors are genuinely building agentic systems. The rest are &lt;a href="https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027" rel="noopener noreferrer"&gt;rebranding RPA, chatbots, and workflow tools&lt;/a&gt; as agents. That is not a coincidence. The label carries funding implications and price premiums that "workflow automation" does not.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Workflow Automation (Zapier / Make / n8n)&lt;/th&gt;
&lt;th&gt;AI Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Task type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deterministic, rules based&lt;/td&gt;
&lt;td&gt;Ambiguous, judgment required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Failure mode&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Loud, logged, traceable&lt;/td&gt;
&lt;td&gt;Silent, propagating, hard to trace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost per 1,000 tasks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.01 to $0.50&lt;/td&gt;
&lt;td&gt;$5 to $100+ (model dependent)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time to deploy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hours to days&lt;/td&gt;
&lt;td&gt;Weeks to months&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Handles novel inputs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No (breaks or skips)&lt;/td&gt;
&lt;td&gt;Yes (reasons about unexpected inputs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintenance burden&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High (prompt drift, model updates, evals)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Debugging difficulty&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Easy (step logs, clear error messages)&lt;/td&gt;
&lt;td&gt;Hard (trace multi-step reasoning chains)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;When it wins&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Structured inputs, known decision paths&lt;/td&gt;
&lt;td&gt;Unstructured data, dynamic tool selection&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Decision Framework: Five Questions
&lt;/h2&gt;

&lt;p&gt;I use a five question diagnostic before recommending an architecture to any client. Answer these in order. The first answer that points decisively in one direction is usually all you need.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Is the task deterministic?
&lt;/h3&gt;

&lt;p&gt;Can you write down every possible input, every possible decision, and every possible output in advance? If yes, you don't need an AI agent. A rules based system will be cheaper, faster, more reliable, and easier to maintain. If no, if the range of inputs is genuinely open ended and the right response requires judgment, you might need an agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. What happens when it goes wrong?
&lt;/h3&gt;

&lt;p&gt;Every system fails eventually. Workflow tool failures are usually loud and logged: a webhook returns a 4xx, a step errors out, a Zap pauses. AI agent failures can be silent. A hallucinated fact gets written to your CRM. An incorrect summary gets sent to a client. A decision branch takes the wrong path and nobody notices for three days. If the cost of a silent failure is high, weight this heavily before choosing an agent architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Does the task require dynamic tool selection at runtime?
&lt;/h3&gt;

&lt;p&gt;An AI agent's real value is choosing, at runtime, which tools to use based on context. If your process always uses the same tools in the same order, you don't need dynamic selection. You need a workflow. Dynamic tool selection is justified when the same goal requires meaningfully different paths depending on inputs. A support agent that might need to check an order status, look up a contract, calculate a refund, and draft a personalized response based on what the customer actually said: that is a real agent job. A form that captures a lead and sends it to your CRM is not.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. What is the token cost at your production volume?
&lt;/h3&gt;

&lt;p&gt;This is the question almost nobody asks upfront. Take your expected monthly volume, multiply by your average task complexity, and calculate the estimated API cost at current model pricing. Then compare that to what the equivalent Zapier or Make plan would cost. I have seen teams build agentic pipelines that cost $2,000 to $8,000 per month in tokens for tasks that would cost $50 per month in workflow automation. If that number is not acceptable, build the automation first and add AI exactly where the workflow breaks down.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Is there genuine business value in handling edge cases programmatically?
&lt;/h3&gt;

&lt;p&gt;Automation handles the 80% of cases that follow a predictable pattern. AI agents shine in the 20% that don't. But not every business needs to handle that 20% programmatically. Many businesses handle edge cases perfectly well with a human in the loop. Ask honestly: what is the dollar value of automating the edge cases versus the cost and complexity of building and maintaining an agent? If the math doesn't close, you are optimizing for technical elegance, not business outcomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Automation Tool Landscape in 2026: Which Platform for What
&lt;/h2&gt;

&lt;p&gt;If your process turns out to be more deterministic than you thought, and most do once you map them carefully, you have a genuinely strong set of tools to choose from. Here is how the main platforms stack up, and where each one earns its place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zapier&lt;/strong&gt; is the most integration rich platform on the market, with connections to over 8,000 apps. If your stack includes less common or niche SaaS tools, Zapier probably connects them. The tradeoff is cost: at scale, Zapier's task based pricing adds up fast. The Team plan at $99 per month gives you 50,000 tasks, which sounds like a lot until a high volume process is running through it. Best suited for businesses that need maximum app coverage and can absorb the per task pricing model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make&lt;/strong&gt; (formerly Integromat) balances power and price better than anything else in the market. Its scenario based pricing instead of task based pricing means complex multistep flows do not cost exponentially more than simple ones. The visual canvas is excellent for building workflows that non technical team members can own and maintain. This is where I send most clients who want to run their automations without a developer on call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;n8n&lt;/strong&gt; is the right tool for technical teams that want serious automation power without vendor lock in. It is open source, self hostable, and has a JavaScript execution node that lets you run arbitrary code inside workflows. &lt;a href="https://flowlyn.com/blog/n8n-user-count-statistics-growth" rel="noopener noreferrer"&gt;n8n has surpassed 230,000 active users&lt;/a&gt; and runs at over 3,000 enterprise companies, backed by Nvidia's investment arm and Accel. For teams comfortable with self hosting, it is also the cheapest option by a large margin.&lt;/p&gt;

&lt;p&gt;And then there are the &lt;strong&gt;AI agent frameworks&lt;/strong&gt;: LangGraph, AutoGen, CrewAI, and the growing ecosystem of managed agent platforms. These are powerful, genuinely necessary for real agentic use cases, and significantly more complex to build, deploy, monitor, and maintain than anything in the automation space. If you are evaluating one of these, read &lt;a href="https://www.jahanzaib.ai/blog/ai-agents-production" rel="noopener noreferrer"&gt;The Complete Guide to Building AI Agents That Actually Work in Production&lt;/a&gt; before you start scoping the build.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;2026 Pricing&lt;/th&gt;
&lt;th&gt;Technical Level&lt;/th&gt;
&lt;th&gt;Integrations&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Zapier&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Maximum app coverage, non-technical teams&lt;/td&gt;
&lt;td&gt;Free to $799/mo&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;8,000+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Make&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Complex flows at low cost, visual builders&lt;/td&gt;
&lt;td&gt;Free to $29/mo (Core)&lt;/td&gt;
&lt;td&gt;Low to Medium&lt;/td&gt;
&lt;td&gt;1,000+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;n8n&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Technical teams, self-hosted, open source&lt;/td&gt;
&lt;td&gt;Free (self-hosted) / $20/mo cloud&lt;/td&gt;
&lt;td&gt;Medium to High&lt;/td&gt;
&lt;td&gt;400+ native, unlimited via HTTP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangGraph / LangChain&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Custom multi-step agent orchestration&lt;/td&gt;
&lt;td&gt;Free (OSS) + LLM API costs&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Custom built&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CrewAI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-agent role-based systems&lt;/td&gt;
&lt;td&gt;Free (OSS) + LLM API costs&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Custom built&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Managed Agent Platforms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Non-technical teams needing agent capabilities&lt;/td&gt;
&lt;td&gt;$500 to $5,000+/mo&lt;/td&gt;
&lt;td&gt;Low (but constrained)&lt;/td&gt;
&lt;td&gt;Platform-dependent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Real Cost Comparison
&lt;/h2&gt;

&lt;p&gt;Let me put real numbers to this, because the handwaving in most "AI vs automation" posts is genuinely frustrating.&lt;/p&gt;

&lt;p&gt;Zapier's most popular plan costs $69 per month for 2,000 tasks. Make's Core plan is $9 per month for 10,000 operations. n8n self hosted is free, and their cloud plan starts at $20 per month. These are serious, production grade tools used by millions of businesses.&lt;/p&gt;

&lt;p&gt;An AI agent processing 2,000 tasks per month using Claude Haiku 4.5 at 1,000 input and 500 output tokens per task costs approximately $90 per month in API fees alone. Before infrastructure, engineering time, debugging, and ongoing maintenance. That is for the cheapest capable model. Step up to a reasoning model for complex tasks and the number multiplies by 20.&lt;/p&gt;

&lt;p&gt;A client came to me last year wanting to build an AI agent to qualify inbound leads. The agent would read each submission, research the company, score the lead, write a personalized first touch email, and log everything to their CRM. They were processing about 400 leads per month.&lt;/p&gt;

&lt;p&gt;I built them a tiered system instead. A Make workflow handled the 70% of leads that matched clean criteria: company size in range, industry on the list, budget field completed. Those got routed immediately with a templated sequence. The remaining 30% that needed judgment got a single lightweight LLM call with structured output. Total monthly cost: $34. A full agentic pipeline for all 400 leads would have cost $400 to $600 per month in API fees alone, with a significantly higher maintenance burden on top.&lt;/p&gt;

&lt;p&gt;Here is a second example on the other side of the equation. A logistics company needed to process inbound freight quote requests. Each one was a PDF or email with carrier names, routes, weights, and special handling notes: variable structure, inconsistent formatting, genuinely ambiguous. That is a legitimate agent use case. Unstructured inputs, semantic understanding required, dynamic routing based on content. Their agent cost $180 per month in API fees to process 1,200 quotes. Without it, they had a full time data entry person at $3,200 per month doing the same work. The math was obvious and the agent paid for itself in week one.&lt;/p&gt;

&lt;p&gt;The pattern holds across industries. Automation wins when inputs are structured. Agents win when they are not. Most inputs are more structured than they look once you sit down and actually map the process.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; Gartner (June 2025) predicts that over 40% of agentic AI projects will be canceled by end of 2027 due to cost escalation and unclear business value. MIT's NANDA Initiative found that 95% of generative AI pilots fail to deliver measurable P&amp;amp;L impact, with the strongest ROI consistently coming from back office process automation. Sources: &lt;a href="https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027" rel="noopener noreferrer"&gt;Gartner 2025&lt;/a&gt;, &lt;a href="https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/" rel="noopener noreferrer"&gt;MIT NANDA via Fortune 2025&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Hybrid Architecture: Where Most Businesses Should Actually Be
&lt;/h2&gt;

&lt;p&gt;Here is what nobody explains clearly in the "automation vs agents" conversation: the best production systems are almost always hybrid. A workflow tool handles the predictable 70% to 80% of volume at near zero cost per task. An AI layer handles the remainder that genuinely needs judgment. The two parts operate independently, handing off based on clear criteria, and the overall system is cheaper and more reliable than a pure agent approach.&lt;/p&gt;

&lt;p&gt;I see three patterns that work consistently in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern one: Pre-filter, then reason.&lt;/strong&gt; A workflow tool categorizes incoming data first using field values and basic conditions. Clean, structured cases get handled directly. Only ambiguous or complex cases pass through to an LLM. This alone reduces token costs by 60% to 80% in most real deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern two: Agent for extraction, automation for routing.&lt;/strong&gt; When inputs are unstructured (emails, PDFs, call notes), an LLM extracts structured fields with high accuracy. Once the data is structured, a workflow tool handles all routing, integrations, and notifications. The LLM does only what it is actually good at: reading messy text. Everything else stays in the deterministic layer where it is cheaper and easier to debug.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern three: Automation as backbone, agents as escalation handlers.&lt;/strong&gt; A workflow runs your entire standard process. When it encounters a case that doesn't match any existing rule, instead of failing or routing to a human, it passes the case to an agent with full context. The agent handles the edge case and, if it needs human review, prepares a summary and routes accordingly. This is the pattern I used for the lead qualification client above, and the one I recommend to most businesses that want AI in their operations without replacing what is already working.&lt;/p&gt;

&lt;p&gt;If you want to see what the automation side of this hybrid looks like in practice for a small business, &lt;a href="https://www.jahanzaib.ai/blog/ai-automations-small-business" rel="noopener noreferrer"&gt;5 AI Automations Every Small Business Should Deploy&lt;/a&gt; covers five specific workflows with real ROI numbers and clear implementation steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Agents Are Genuinely the Right Answer
&lt;/h2&gt;

&lt;p&gt;I want to be specific here, because the "don't overbuild" message can tip into "never build agents" if you're not careful. There are real use cases where agents are not just appropriate but necessary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unstructured data at the center of the process.&lt;/strong&gt; If your business processes live in emails, PDFs, call transcripts, or legal documents, inputs that resist schema, an agent is doing real work that a workflow tool physically cannot do. Parsing a contract for specific clause types, extracting intent from support conversations, summarizing research across 40 documents: these are genuine agent jobs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multistep reasoning with feedback loops.&lt;/strong&gt; When a task requires the system to evaluate its own output, retry with a different approach, or ask clarifying questions before proceeding, you need an agent. A workflow tool executes steps. It cannot evaluate whether a step's output is good enough to continue or whether it needs to loop back and try again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic tool selection at runtime.&lt;/strong&gt; If the right action depends on what the data actually says, sometimes update Salesforce, sometimes escalate to Jira, sometimes flag for human review based on sentiment, and you cannot predict the distribution in advance, an agent's tool selection capability earns its cost.&lt;/p&gt;

&lt;p&gt;I have shipped &lt;a href="https://www.jahanzaib.ai/work" rel="noopener noreferrer"&gt;11 production AI systems&lt;/a&gt; in the last two years that meet at least two of these criteria. They work, they deliver measurable ROI, and I am genuinely proud of them. They are also a small fraction of the AI projects I have been pitched. The honest answer, most of the time, is: start with n8n or Make, see exactly where it breaks down, and add AI there. If you want to understand where a RAG knowledge layer fits into this picture, the explainer in &lt;a href="https://www.jahanzaib.ai/blog/what-is-rag-business-guide" rel="noopener noreferrer"&gt;What Is RAG? The Business Owner's Guide&lt;/a&gt; covers the basics without the technical jargon.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fze47v8nutdkcralxs0va.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fze47v8nutdkcralxs0va.png" alt="n8n open source workflow automation platform showing visual workflow builder" width="800" height="420"&gt;&lt;/a&gt;&lt;em&gt;n8n open source workflow automation platform showing visual workflow builder&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Implementation Path That Actually Works
&lt;/h2&gt;

&lt;p&gt;Here is the approach I give every client who wants AI in their operations, regardless of starting point.&lt;/p&gt;

&lt;p&gt;Map your process in full first. Every input type, every decision point, every output. Do this before evaluating any tools. You will find that most of the process is already deterministic. It just doesn't feel that way because it lives in someone's head.&lt;/p&gt;

&lt;p&gt;Implement the deterministic parts with a workflow tool. Get it running in production. Measure it. Watch where it fails. The failure points, the cases that fall through the cracks, the inputs that break the rules, the decisions that need judgment, are the genuine AI agent opportunities.&lt;/p&gt;

&lt;p&gt;Then, and only then, add AI to those specific failure points. Not to the whole process. Not as a replacement for the automation that is already working. At exactly the spots where determinism ran out.&lt;/p&gt;

&lt;p&gt;When you do add an AI component, keep it scoped. One LLM call with clear inputs and outputs is easier to debug, cheaper to run, and simpler to improve than a full agent chain. If one call is not enough, add a second. Build up incrementally. You will know when you actually need a full agent because you will have hit the real limits of what structured logic can do, with production data proving it.&lt;/p&gt;

&lt;p&gt;For teams that want to understand the full picture of what a production agent involves before committing to a build, I wrote a detailed breakdown in &lt;a href="https://www.jahanzaib.ai/blog/ai-agents-production" rel="noopener noreferrer"&gt;The Complete Guide to Building AI Agents That Actually Work in Production&lt;/a&gt;. It covers architecture patterns, RAG pipelines, tool use design, multi agent orchestration, and cost optimization across 109 real deployments.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Developer trust in AI accuracy has fallen ten percentage points in a single year. Overdeployment is a likely driver.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A Note on the Hype Cycle
&lt;/h2&gt;

&lt;p&gt;I am not anti AI agent. I am anti AI agent as default. Those are different positions.&lt;/p&gt;

&lt;p&gt;The vendors selling agentic AI platforms have every incentive to convince you that everything is an agent use case. Meanwhile, &lt;a href="https://www.punku.ai/blog/state-of-ai-2024-enterprise-adoption" rel="noopener noreferrer"&gt;42% of companies abandoned most of their AI initiatives in 2025&lt;/a&gt;, up from 17% in 2024, according to McKinsey's State of AI report. The most common reasons were cost escalation and unclear business value, exactly what happens when you build an agent for a task that needed an automation.&lt;/p&gt;

&lt;p&gt;Less than 10% of organizations have actually scaled AI agents in any single function, despite over 80% reporting some form of AI use. The gap between "we are using AI" and "AI is generating measurable business value" is where most organizations are stuck right now. The way out of that gap is almost always to go simpler, not more complex.&lt;/p&gt;

&lt;p&gt;If you want a structured way to figure out where your business sits on this spectrum, and which approach is right for your specific context, the &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI Readiness Assessment&lt;/a&gt; I built gives you a concrete answer in about 12 questions. It is not a sales funnel. It is the same diagnostic I use with clients before scoping any engagement. Take it and you will know exactly whether your use case calls for an agent, a hybrid system, or pure automation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the difference between an AI agent and a workflow automation tool?
&lt;/h3&gt;

&lt;p&gt;Workflow automation tools like Zapier, Make, and n8n execute predefined sequences of steps triggered by events. They follow rules you define in advance and cannot make decisions based on context. AI agents use language models to make decisions during execution, choosing tools and actions based on the actual content of each input. Agents handle genuine ambiguity that automation tools cannot, at significantly higher cost per task and with much lower output predictability.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I use Zapier or n8n instead of an AI agent?
&lt;/h3&gt;

&lt;p&gt;Use automation tools when your process is deterministic, when you can define all inputs, decisions, and outputs in advance. Most business processes are more deterministic than they feel when they live in someone's head. If 70% or more of your tasks follow a predictable pattern, start with automation. Handle the predictable cases at low cost and add AI only at the points where the automation genuinely breaks down. This approach typically reduces monthly operating cost by 70% to 90% compared to routing everything through an LLM.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why do so many AI agent projects fail?
&lt;/h3&gt;

&lt;p&gt;The primary causes are misaligned expectations, unpredictable behavior in production, and token cost underestimation. Most projects are scoped in demo conditions where edge cases are rare and inputs are clean. In production, edge cases are common, inputs are messy, and the failure modes of agentic systems, silent hallucinations and error propagation in multistep chains, are much harder to detect than workflow failures. MIT's 2025 NANDA study found that 95% of generative AI pilots fail to deliver measurable P&amp;amp;L impact.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I know if my use case genuinely needs an AI agent?
&lt;/h3&gt;

&lt;p&gt;The most reliable signals: the task involves unstructured inputs like documents, emails, or free form text that resist schema. The right response varies significantly based on semantic content rather than field values. The process requires multistep reasoning with feedback loops rather than a fixed sequence. And the business value of handling edge cases programmatically clearly exceeds the ongoing cost of the agent infrastructure. If you cannot say yes to at least two of those, start with automation.&lt;/p&gt;

&lt;h3&gt;
  
  
  What automation tools do you recommend for businesses not ready for AI agents?
&lt;/h3&gt;

&lt;p&gt;For teams with technical resources, n8n is the strongest starting point: open source, self hostable, 230,000 plus active users, and backed by Nvidia. For non technical teams that need a visual builder, Make is excellent and costs a fraction of Zapier. Zapier remains the most integration rich option with over 8,000 app connections but costs more at scale. All three handle the 70% to 80% of your process that is deterministic at a fraction of the cost of an AI agent stack. Once you have automation running in production, you have real data to make a precise, justified agent investment.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does Gartner say about AI agent adoption in 2025?
&lt;/h3&gt;

&lt;p&gt;Gartner's June 2025 report predicts that over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear business value, or inadequate risk controls. Their analyst Anushree Verma stated directly that "many use cases positioned as agentic today don't require agentic implementations." Gartner also found that only about 130 of the thousands of vendors claiming to offer agentic AI are actually building genuinely agentic systems. The rest are rebranding RPA and basic chatbots.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I start with automation and upgrade to AI agents later?
&lt;/h3&gt;

&lt;p&gt;Yes, and this is usually the right path. A well built automation gives you production data about where the process actually breaks down, which tells you precisely where an AI layer would add value. Teams that start with automation and add AI incrementally almost always end up with better systems than teams that build full agent stacks from scratch. The transition is also straightforward: once you know which step needs reasoning, you replace that step with an LLM call and leave the rest of the workflow unchanged.&lt;/p&gt;

&lt;h3&gt;
  
  
  What industries benefit most from AI agents versus automation?
&lt;/h3&gt;

&lt;p&gt;Industries with high volumes of unstructured text inputs see the strongest case for agents: legal, healthcare, insurance, real estate, and businesses running significant email or document workflows. Industries with structured transactional data, e-commerce fulfillment, basic customer support routing, appointment scheduling, financial reporting, almost always get better ROI from automation tools. The determining factor is the nature of the input data, not the industry itself. A well run e-commerce operation might have one genuine agent use case in returns processing and pure automation everywhere else.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>automation</category>
      <category>zapier</category>
      <category>n8n</category>
    </item>
    <item>
      <title>I Compared Make.com and n8n Across 20+ Client Deployments. Here Is My Verdict.</title>
      <dc:creator>Jahanzaib</dc:creator>
      <pubDate>Sun, 05 Apr 2026 13:20:06 +0000</pubDate>
      <link>https://forem.com/jahanzaibai/i-compared-makecom-and-n8n-across-20-client-deployments-here-is-my-verdict-mlb</link>
      <guid>https://forem.com/jahanzaibai/i-compared-makecom-and-n8n-across-20-client-deployments-here-is-my-verdict-mlb</guid>
      <description>&lt;p&gt;A client came to me in January with a Make.com scenario that had started as a simple lead routing workflow and mutated into a 47-step monster. It was timing out. It was burning through their operations credits. And when they needed to add an AI agent that could make decisions based on their CRM data, Make had no good answer. Three weeks later, after rebuilding the whole thing in n8n, their monthly automation bill dropped by 71% and the AI agent actually worked.&lt;/p&gt;

&lt;p&gt;That project pushed me to do something I had been putting off: a real, systematic comparison of Make.com and n8n for AI agent workflows. Not a feature checklist review. A practitioner's assessment built on two years of deploying both platforms across more than 20 client environments.&lt;/p&gt;

&lt;p&gt;Here is what I found, and more importantly, here is the decision framework I now use before I write a single node.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;n8n is the stronger platform for AI agent workflows in 2026. Make's AI agents launched in beta in April 2025 and still have significant limitations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make's credit-based pricing can surprise you at scale. A 5-step workflow processing 1,000 records daily needs the Teams plan plus overages. The same workload fits comfortably in n8n's $60/month Pro plan.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make wins for non-technical teams building standard automations fast. Its 3,000+ integrations and visual-first interface are genuinely excellent.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;n8n's self-hosted Community Edition is free and unlimited. For clients with data residency requirements, this is a non-starter decision.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I use Make for marketing ops, simple CRM syncs, and prototyping. I use n8n for anything involving LLMs, multi-step reasoning, or production AI agents.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Comparison Matters Now
&lt;/h2&gt;

&lt;p&gt;For three years, Make.com and n8n competed on roughly the same ground: connecting apps, moving data, triggering actions. The question was always about price and ease of use. AI agents changed the stakes completely.&lt;/p&gt;

&lt;p&gt;When a workflow needs an LLM to decide what happens next, the platform architecture starts to matter in ways it never did before. Can the platform handle tool calls? Can it maintain memory across steps? Can it route between agents based on context rather than fixed conditions? These are fundamentally different requirements from "when this Google Sheet row is updated, send a Slack message."&lt;/p&gt;

&lt;p&gt;I have clients running both platforms right now. I have rebuilt Make scenarios in n8n and vice versa. The comparison I am about to walk through is not theoretical. Every decision point I describe reflects a real conversation I had with a real client about a real deployment.&lt;/p&gt;

&lt;p&gt;If you are trying to figure out whether your business needs AI agents or whether simpler automation will do the job, my &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI Readiness Assessment&lt;/a&gt; can help you get clarity before you commit to a platform. But if you already know you are building something with LLM decision-making, read on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1568952433726-3896e3881c65%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1568952433726-3896e3881c65%3Fw%3D1200%26q%3D80" alt="Software developer reviewing workflow automation dashboard on dark monitor" width="1200" height="801"&gt;&lt;/a&gt;&lt;em&gt;Workflow architecture matters far more for AI agents than for standard automations.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Platform Architecture: How Each One Thinks About Automation
&lt;/h2&gt;

&lt;p&gt;Make.com was built around the concept of a scenario: a visual flowchart of modules connected by data paths. You drag modules onto a canvas, connect them, configure each one, and watch data flow left to right. It is genuinely intuitive. Non-technical team members can build and maintain Make scenarios without developer involvement, which is a real competitive advantage.&lt;/p&gt;

&lt;p&gt;n8n was built around nodes in a directed acyclic graph. It looks similar on the surface but operates very differently. Where Make enforces a mostly linear flow, n8n nodes can branch, merge, loop, and call sub-workflows. You can write JavaScript directly inside a node. You can define custom node types. The ceiling on what you can express is significantly higher, but the floor requires more technical confidence to reach.&lt;/p&gt;

&lt;p&gt;Neither architecture is inherently superior. The divergence shows up the moment you add AI.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Make Handles AI
&lt;/h3&gt;

&lt;p&gt;Make's original AI integration model is webhook-based: you call OpenAI or Anthropic via their HTTP modules, get a response, and route it through subsequent modules. This works fine for simple one-shot tasks like generating a summary, classifying a sentiment, or drafting a message.&lt;/p&gt;

&lt;p&gt;In April 2025, Make launched Make AI Agents, which introduced something more: context-aware agents that can be reused across scenarios, configured with a global system prompt, and connected to Make's 3,000+ app integrations as tools. The announcement was significant. The reality, as of April 2026, is that Make AI Agents are still in beta and carry some important caveats.&lt;/p&gt;

&lt;p&gt;The agents cannot function outside of Make scenarios. They do not support RAG pipelines natively. If you want to give your Make agent access to a vector database for retrieval, you are building that connection yourself via HTTP modules, not through a native integration. Memory management across sessions is limited. And the beta label matters: I have seen unexpected behaviors in production that I would not accept in a client-facing system.&lt;/p&gt;

&lt;h3&gt;
  
  
  How n8n Handles AI
&lt;/h3&gt;

&lt;p&gt;n8n 2.0 redesigned the platform around AI as a first-class citizen. The platform now includes approximately 70 dedicated AI nodes covering LLM providers (OpenAI, Anthropic, Google, Mistral, local models via Ollama), a native Agent node with ReAct-style reasoning, built-in RAG pipelines with document loaders and text splitters, and vector store integrations with Pinecone, Qdrant, Supabase, and Chroma.&lt;/p&gt;

&lt;p&gt;What this means in practice: building an AI agent in n8n that retrieves context from your knowledge base, reasons about a customer query, calls external tools, and writes a structured response to your CRM is a workflow you can build entirely within native nodes. No custom HTTP calls. No stitching together disparate modules. The architecture was designed for this from the ground up.&lt;/p&gt;

&lt;p&gt;I covered the n8n agent architecture in detail in my guide to &lt;a href="https://www.jahanzaib.ai/blog/n8n-ai-agent-workflows-practitioner-guide" rel="noopener noreferrer"&gt;n8n 2.0 AI Agent workflows&lt;/a&gt;. The short version: n8n's LangChain integration is the most capable no-code/low-code AI agent framework I have deployed in a production environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504711434969-e33886168f5c%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504711434969-e33886168f5c%3Fw%3D1200%26q%3D80" alt="Code editor with automation workflow logic and API integration setup" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;n8n's native AI nodes remove the need for custom HTTP calls when building agent workflows.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pricing Trap: Operations vs Executions
&lt;/h2&gt;

&lt;p&gt;This is the part of the Make.com and n8n comparison that most articles get wrong, and it is where I have seen clients genuinely surprised by their bills.&lt;/p&gt;

&lt;p&gt;Make charges by operation. An operation is every action a module performs. A 10-step workflow that runs once consumes 10 operations. Run it 1,000 times a day and you burn 10,000 operations daily, or 300,000 monthly. Make switched from an operations-based model to a credit-based system in August 2025, but the core mechanic is the same: every module execution costs something.&lt;/p&gt;

&lt;p&gt;n8n charges by execution. A workflow run is one execution regardless of how many nodes it passes through. A 2-step workflow and a 200-step AI agent both count as one execution.&lt;/p&gt;

&lt;p&gt;Here is a real example. A client processing 1,000 customer records daily through a 5-step qualification workflow needed the Teams plan on Make ($29/month for 80,000 credits) plus overages because 1,000 x 5 x 30 = 150,000 monthly operations. On n8n, 30,000 monthly executions fits comfortably inside the Pro plan at $60/month. The per-month difference was small. But when I added the AI agent layer, which involved 8 additional LLM calls per record, the Make cost exploded while n8n's cost stayed flat.&lt;/p&gt;

&lt;p&gt;For AI agent workflows specifically, where a single execution might involve a dozen tool calls, RAG retrieval, and multiple LLM roundtrips, n8n's per-execution pricing model is substantially more predictable.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; Vodafone UK saved £2.2 million annually after migrating threat intelligence workflows to n8n. The platform's execution-based pricing and self-hosting option were both cited as factors. &lt;a href="https://hatchworks.com/blog/ai-agents/n8n-vs-make/" rel="noopener noreferrer"&gt;HatchWorks, 2026&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Self-Hosting: The Data Residency Question
&lt;/h2&gt;

&lt;p&gt;Make.com is cloud-only. There is no self-hosted option, no on-premise deployment, no way to run it inside your own infrastructure. If your client operates in a regulated industry (healthcare, finance, legal), or if they have enterprise data governance requirements, this is often the end of the conversation.&lt;/p&gt;

&lt;p&gt;n8n's Community Edition is open source, free, and can run anywhere Docker runs. I have deployed it on AWS EC2, on DigitalOcean droplets, inside Kubernetes clusters, and on client-managed virtual machines. When a client tells me their data cannot leave their AWS VPC, n8n is the only answer.&lt;/p&gt;

&lt;p&gt;Self-hosting also eliminates the most important ongoing cost for high-volume deployments. A workflow running 50,000 times per day on n8n Community Edition costs nothing in platform fees. The infrastructure cost is whatever your server costs. For the right scale and the right technical team, this is a significant advantage.&lt;/p&gt;

&lt;p&gt;The trade-off is operational overhead. Your team maintains the installation, handles updates, monitors uptime, and manages backups. Make and n8n Cloud abstract all of that. Whether the trade-off is worth it depends on your team and your volume.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1449824913935-59a10b8d2000%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1449824913935-59a10b8d2000%3Fw%3D1200%26q%3D80" alt="Server infrastructure representing self-hosted workflow automation deployment" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;Self-hosting n8n gives full infrastructure control, critical for regulated industries and high-volume workloads.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Agent Capabilities: The Head-to-Head
&lt;/h2&gt;

&lt;p&gt;Let me be direct about this because it is the most important part of the comparison for 2026 deployments.&lt;/p&gt;

&lt;p&gt;n8n's AI agent capabilities are mature, production-tested, and architecturally sound. Make's AI agent capabilities are promising but not yet at the level I would stake a client's production system on.&lt;/p&gt;

&lt;p&gt;Here is how they compare across the dimensions that matter most for real deployments:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Make.com&lt;/th&gt;
&lt;th&gt;n8n&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM integration&lt;/td&gt;
&lt;td&gt;Via HTTP modules or native OpenAI/Anthropic modules&lt;/td&gt;
&lt;td&gt;70+ native AI nodes, LangChain integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent reasoning (ReAct)&lt;/td&gt;
&lt;td&gt;Beta, limited tool calling&lt;/td&gt;
&lt;td&gt;Native Agent node, full ReAct loop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAG / vector DB&lt;/td&gt;
&lt;td&gt;No native support, requires custom HTTP calls&lt;/td&gt;
&lt;td&gt;Native document loaders, Pinecone, Qdrant, Supabase&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory across sessions&lt;/td&gt;
&lt;td&gt;Limited, manual implementation&lt;/td&gt;
&lt;td&gt;Window buffer, entity, summary memory nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-agent orchestration&lt;/td&gt;
&lt;td&gt;Not supported natively&lt;/td&gt;
&lt;td&gt;Sub-agents, callable workflows, chaining&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging AI workflows&lt;/td&gt;
&lt;td&gt;Limited visibility into LLM steps&lt;/td&gt;
&lt;td&gt;Full execution logs, node-level inspection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local LLM support&lt;/td&gt;
&lt;td&gt;Via HTTP only&lt;/td&gt;
&lt;td&gt;Native Ollama node, local model support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production stability&lt;/td&gt;
&lt;td&gt;AI agents in beta&lt;/td&gt;
&lt;td&gt;AI nodes stable, production-ready&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The RAG gap is the one I run into most often with clients considering Make for AI agent work. Every substantive AI agent deployment I have done involves giving the agent access to some body of knowledge: product documentation, previous case notes, company policies, historical data. Without native vector database support, implementing this in Make requires custom HTTP calls to external vector DBs, manual chunking of documents, and embedding logic that you write yourself. In n8n, this is a configured workflow with native nodes.&lt;/p&gt;

&lt;p&gt;The memory gap matters for customer-facing agents. An AI agent handling support requests needs to remember what was said earlier in the conversation and, ideally, what happened in previous conversations. n8n provides buffer memory, entity memory, and summary memory nodes that implement these patterns without custom code. Make does not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integration Coverage: Where Make Has the Real Edge
&lt;/h2&gt;

&lt;p&gt;n8n has approximately 400 native nodes plus 600 or so community-maintained ones. Make.com has over 3,000 pre-built connectors. For most standard SaaS integrations, especially in marketing tech, sales tools, and business apps, Make's library is deeper and more polished.&lt;/p&gt;

&lt;p&gt;I have run into this concretely. A client needed to connect a workflow to a niche HR platform with an unusual API structure. Make had a native connector with a clean visual interface. n8n had no node for it, which meant building an HTTP Request node with manual authentication handling. Not difficult for a developer, but slower and more error-prone for a less technical team member.&lt;/p&gt;

&lt;p&gt;For teams that primarily work with common SaaS tools like HubSpot, Salesforce, Shopify, Stripe, Gmail, Slack, Asana, and similar platforms, this difference barely matters. Both platforms cover the major players well. For teams with unusual tooling, Make's breadth is a genuine advantage.&lt;/p&gt;

&lt;h2&gt;
  
  
  When I Choose Make.com
&lt;/h2&gt;

&lt;p&gt;I still recommend Make for plenty of client scenarios. Here is when I reach for it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-technical teams building their own workflows.&lt;/strong&gt; Make's visual canvas is faster to learn and faster to ship for people who do not have a programming background. If a marketing ops person needs to build and maintain the automation themselves without developer support, Make almost always wins on time-to-value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple one-shot AI tasks.&lt;/strong&gt; Summarizing a document, classifying an inbound lead's intent, generating a first draft of a follow-up email. Anything where you call an LLM once, get a result, and route it somewhere. Make handles these cleanly with its native OpenAI and Anthropic modules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prototyping and experimentation.&lt;/strong&gt; Make's drag-and-drop speed makes it excellent for proving out a concept before investing in a more complex n8n architecture. I have built scenarios in Make to validate a workflow idea in an afternoon, then rebuilt the production version in n8n over the following week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Heavy SaaS integration work.&lt;/strong&gt; When a workflow needs to touch eight different marketing and sales tools, Make's polished native connectors often reduce implementation time significantly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1571171637578-41bc2dd41cd2%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1571171637578-41bc2dd41cd2%3Fw%3D1200%26q%3D80" alt="Person working at computer building visual automation workflow" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;Make.com's visual-first interface gives non-technical teams a faster path to working automations.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  When I Choose n8n
&lt;/h2&gt;

&lt;p&gt;n8n is my default platform for anything involving real AI agent behavior. Specifically:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Any workflow where an LLM makes decisions.&lt;/strong&gt; Not just generates text. If the LLM's output determines what happens next in the workflow, n8n's native agent architecture handles this reliably. Make's conditional routing based on AI output requires more custom workarounds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG pipelines and knowledge base agents.&lt;/strong&gt; If the agent needs to retrieve context from a document store, vector database, or indexed knowledge base, n8n is the only platform that makes this manageable without writing a lot of custom code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-step reasoning workflows.&lt;/strong&gt; Customer support agents that diagnose issues, gather context, check policies, draft responses, and escalate when needed. Research agents that iterate across multiple sources before synthesizing a report. These require stateful reasoning across many steps, which n8n's Agent node is designed for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data-sensitive or regulated deployments.&lt;/strong&gt; If the client cannot use cloud infrastructure for their automation data, n8n self-hosted is the answer. I covered this in detail when discussing the &lt;a href="https://www.jahanzaib.ai/blog/nanoclaw-setup-guide-whatsapp-telegram-ai-agents" rel="noopener noreferrer"&gt;NanoClaw deployment patterns&lt;/a&gt; I use for clients who need on-premise AI agent infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High-volume workflows at scale.&lt;/strong&gt; The per-execution pricing model makes n8n dramatically more cost-effective for workflows that run thousands of times per day, especially when each run involves multiple AI steps.&lt;/p&gt;

&lt;p&gt;My &lt;a href="https://www.jahanzaib.ai/services" rel="noopener noreferrer"&gt;AI systems architecture services&lt;/a&gt; almost always involve n8n for the agent layer. I have deployed it for e-commerce companies processing thousands of orders daily, for B2B SaaS firms running automated prospect research, and for service businesses that need AI agents handling customer qualification. In each case, n8n's production-ready AI infrastructure was the deciding factor.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Decision Framework I Actually Use
&lt;/h2&gt;

&lt;p&gt;When a client comes to me with an automation or AI agent project, I run through four questions before I recommend a platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question 1: Does the workflow require an LLM to make decisions, not just generate text?&lt;/strong&gt; If yes, n8n. If no, either platform works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question 2: Will the agent need access to a knowledge base or external data for context?&lt;/strong&gt; If yes, n8n. Make's RAG limitations make this impractical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question 3: Who will build and maintain the workflow?&lt;/strong&gt; If a developer or technical team member, n8n. If a non-technical business user, Make. The exception is if the workflow is complex enough that technical oversight is required regardless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question 4: What are the data residency and compliance requirements?&lt;/strong&gt; If data cannot leave a specific infrastructure environment, n8n self-hosted. No exceptions.&lt;/p&gt;

&lt;p&gt;Most client projects that involve genuine AI agents, not just LLM API calls embedded in workflows, end up on n8n. Most marketing and operations automation projects where a non-technical team needs ownership of the system end up on Make. That split has held across two years and more than 20 engagements.&lt;/p&gt;

&lt;p&gt;If you are still figuring out whether your business situation calls for AI agents at all, or whether simpler automation would solve the same problem, the analysis from my post on &lt;a href="https://www.jahanzaib.ai/blog/when-to-use-ai-agents-vs-automation" rel="noopener noreferrer"&gt;when to use AI agents vs automation&lt;/a&gt; is a useful starting point. The short answer surprises most people: many clients who come to me wanting AI agents actually need better Zapier workflows. But when they really do need agents, n8n is where I build them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1664575599730-0814817939de%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1664575599730-0814817939de%3Fw%3D1200%26q%3D80" alt="Modern laptop with code and workflow interface showing AI automation configuration" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;The right platform choice comes down to four questions about your workflow's AI requirements and your team's technical capacity.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Expect From Both Platforms in Late 2026
&lt;/h2&gt;

&lt;p&gt;Make's AI agent roadmap is moving quickly. The beta status will not last forever, and Make has significant commercial incentive to close the gap with n8n on AI agent capabilities. Their 3,000-app integration advantage is a real foundation to build agent tooling on top of. I expect native RAG support and better memory management from Make within the next 12 months.&lt;/p&gt;

&lt;p&gt;n8n is expanding its hosted infrastructure and enterprise features. The self-hosted advantage is strong, but n8n Cloud is becoming a more compelling option for teams that want the platform's AI capabilities without the operational overhead of managing their own installation.&lt;/p&gt;

&lt;p&gt;The competition between them is genuinely good for practitioners. It is driving faster development of AI agent capabilities in both platforms. A year ago, neither had what n8n has now. A year from now, Make will likely have closed a meaningful portion of the gap.&lt;/p&gt;

&lt;p&gt;For today's deployments, though, if I am building an AI agent that needs to reason, retrieve, remember, and route decisions, n8n is where I start. If you want to explore what that looks like for your specific business situation, &lt;a href="https://www.jahanzaib.ai/contact" rel="noopener noreferrer"&gt;reach out and let's talk through it&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Make.com good for AI agents in 2026?
&lt;/h3&gt;

&lt;p&gt;Make.com launched AI Agents in beta in April 2025 and the feature is still maturing. For simple one-shot AI tasks like text generation or classification, Make works well. For complex AI agents that need RAG pipelines, multi-step reasoning, or persistent memory, n8n is more capable and production-ready. I would not stake a production AI agent system on Make's beta AI agent features without thorough testing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is n8n harder to use than Make.com?
&lt;/h3&gt;

&lt;p&gt;Yes, n8n has a steeper learning curve, especially for non-technical users. Make.com's visual canvas is more intuitive for beginners and faster for building standard automations. n8n rewards technical investment with much greater flexibility, but a non-developer building their first automation will generally move faster on Make. For AI agent workflows specifically, n8n's complexity is worth the learning investment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Make.com replace n8n for AI agents?
&lt;/h3&gt;

&lt;p&gt;Not currently. Make's AI Agents lack native RAG support, limited memory management, and no multi-agent orchestration. n8n's 70+ native AI nodes, LangChain integration, and vector database support give it a significant architecture advantage for AI agent workflows. Make may close this gap over the next 12 to 18 months, but as of mid-2026, n8n is the stronger choice for AI agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which is cheaper, Make.com or n8n?
&lt;/h3&gt;

&lt;p&gt;It depends on your use case and scale. For low-volume simple automations, Make's pricing is competitive. For high-volume workflows, especially AI agent workflows with many steps per execution, n8n's per-execution pricing model becomes significantly cheaper because you pay for the workflow run, not every module step. For maximum savings at scale, n8n's free self-hosted Community Edition has no platform fees at all.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can n8n be self-hosted for free?
&lt;/h3&gt;

&lt;p&gt;Yes. n8n's Community Edition is open source and free to self-host with no limitations on executions or workflows. You only pay for the server infrastructure you run it on. The cloud plans ($24 to $800 per month) are for teams that want managed hosting without operational overhead. Self-hosted n8n is a compelling option for technical teams running high-volume workloads or operating in regulated environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Make.com support RAG (retrieval-augmented generation)?
&lt;/h3&gt;

&lt;p&gt;Not natively. Make does not have built-in document loaders, text splitters, or vector database integrations. To implement RAG in Make, you would need to call external APIs via HTTP modules and manage the embedding and retrieval logic yourself. n8n has native support for RAG pipelines including document loaders, chunking, and integrations with Pinecone, Qdrant, Supabase, and Chroma vector databases.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the best automation platform for small businesses in 2026?
&lt;/h3&gt;

&lt;p&gt;For small businesses without technical staff, Make.com is often the better starting point because of its ease of use and extensive app library. For small businesses that want AI agents and can invest in setup time, n8n offers much better AI capabilities. For businesses with regulatory or data residency requirements, n8n self-hosted is the only real option. The right choice depends on your team's technical capacity and whether you need genuine AI decision-making or standard workflow automation.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I migrate from Make.com to n8n?
&lt;/h3&gt;

&lt;p&gt;There is no automated migration path between Make scenarios and n8n workflows. Migration requires rebuilding each workflow in n8n from scratch, which is also an opportunity to simplify and optimize. I typically audit the existing Make scenarios first to identify which workflows genuinely benefit from n8n's capabilities, which ones can stay on Make, and which ones can be deprecated entirely. For a phased migration, I recommend starting with your highest-value or most complex AI-involved workflows first.&lt;/p&gt;

</description>
      <category>makecom</category>
      <category>n8n</category>
      <category>workflowautomation</category>
      <category>aiagents</category>
    </item>
    <item>
      <title>MCP Just Hit 97 Million Installs. The Dev Summit Showed What Comes Next for AI Agents.</title>
      <dc:creator>Jahanzaib</dc:creator>
      <pubDate>Sun, 05 Apr 2026 07:23:43 +0000</pubDate>
      <link>https://forem.com/jahanzaibai/mcp-just-hit-97-million-installs-the-dev-summit-showed-what-comes-next-for-ai-agents-2j2b</link>
      <guid>https://forem.com/jahanzaibai/mcp-just-hit-97-million-installs-the-dev-summit-showed-what-comes-next-for-ai-agents-2j2b</guid>
      <description>&lt;p&gt;The Model Context Protocol just crossed 97 million monthly SDK installs. That number landed at the end of March 2026, and two weeks later, April 2 and 3, hundreds of engineers and enterprise architects packed into a venue in New York City for the first ever MCP Dev Summit. I have been building production AI agent systems for three years. I have deployed MCP servers for clients across healthcare, ecommerce, and logistics. And I can tell you: these two milestones together mark a genuine inflection point, not just for the protocol but for every business trying to figure out whether to build with AI agents right now.&lt;/p&gt;

&lt;p&gt;This is my read of what happened, what the summit surfaced, and what it means in practice if you are about to make an AI investment decision.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;MCP grew from 2 million to 97 million monthly SDK downloads in 16 months, outpacing React's comparable adoption trajectory&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Every major AI provider (Anthropic, OpenAI, Google, Microsoft, AWS, Cloudflare) now ships MCP-compatible tooling, ending the per-provider integration tax&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The first MCP Dev Summit (April 2 to 3, NYC) surfaced a critical pattern: enterprise teams hit the same wall at scale, authentication gaps, missing audit trails, and brittle static credentials&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;30 plus CVEs were filed against MCP implementations in January and February 2026 alone, with 43% involving command injection vulnerabilities&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The 2026 roadmap explicitly targets enterprise gaps: SSO-integrated auth, workload identity federation, and gateway standardization&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you are evaluating AI agents for your business, MCP being infrastructure-grade changes the build-vs-buy calculus significantly&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1639762681485-074b7f938ba0%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1639762681485-074b7f938ba0%3Fw%3D1200%26q%3D80" alt="Network nodes and connections representing Model Context Protocol infrastructure" width="1200" height="675"&gt;&lt;/a&gt;&lt;em&gt;MCP has become the connective tissue linking AI models to every tool in the stack&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Just Happened: Two Milestones in Two Weeks
&lt;/h2&gt;

&lt;p&gt;Let me give you the concrete timeline so the significance is clear.&lt;/p&gt;

&lt;p&gt;Anthropic launched the Model Context Protocol in November 2024. At launch, the TypeScript and Python SDKs combined for roughly 2 million monthly downloads. Not bad for a new open protocol, but not infrastructure scale either. At that point MCP was an interesting idea from one AI lab, with a small but enthusiastic developer community and a handful of reference server implementations.&lt;/p&gt;

&lt;p&gt;By March 25, 2026, those same SDKs crossed 97 million monthly downloads. That is a 4,750% increase in 16 months. For context, React, the most widely adopted JavaScript UI framework ever built, took approximately three years to reach comparable monthly download scale. MCP compressed that trajectory roughly in half. The difference was unified vendor backing from day one: rather than competing standards fragmenting the ecosystem, every major AI provider aligned around MCP early, which created a network effect that accelerated adoption far faster than any single company could have achieved alone.&lt;/p&gt;

&lt;p&gt;Then came the summit.&lt;/p&gt;

&lt;p&gt;The Agentic AI Foundation, the Linux Foundation entity that now governs MCP, organized the first MCP Dev Summit North America for April 2 and 3 in New York City. The program ran more than 95 sessions. Speakers came from Anthropic, OpenAI, AWS, Docker, Datadog, Uber, PwC, Workato, and a long list of enterprises that have been quietly running MCP in production for months. David Soria Parra, one of MCP's co-creators, delivered a keynote. Nick Cooper from OpenAI presented alongside him as a core protocol maintainer. This was not a product launch event. It was an engineering conference for people who have already shipped things and needed to compare notes on what broke.&lt;/p&gt;

&lt;p&gt;That distinction matters. When the conversations at a developer summit center on what failed in production rather than what demos look impressive, it means the technology has crossed from experimental to real.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers That Prove MCP Won the Standard War
&lt;/h2&gt;

&lt;p&gt;I want to sit with the adoption numbers for a moment because they explain something important about the current AI agent landscape.&lt;/p&gt;

&lt;p&gt;The MCP server ecosystem grew from a handful of reference implementations at launch to more than 5,800 community and enterprise servers by early 2026. Those servers cover databases, CRMs, cloud providers, productivity tools, developer tools, ecommerce platforms, analytics services, and dozens of other categories. More than 10,000 MCP servers are reportedly active in production environments today. That number includes Fortune 500 deployments that moved from pilot to production in Q1 2026.&lt;/p&gt;

&lt;p&gt;The provider alignment is equally significant. When I started building AI agent systems in 2023, a meaningful chunk of my project time went to integration plumbing. If a client used Claude for one workflow and GPT for another, I was writing duplicate connector code for every tool in their stack. Every model had its own API shape, its own authentication patterns, its own way of calling external functions. It was the same problem REST APIs solved for web services in the early 2000s, except nobody had built REST for AI agents yet.&lt;/p&gt;

&lt;p&gt;MCP solved that. Anthropic, OpenAI, Google DeepMind, Microsoft, AWS, and Cloudflare all ship MCP-compatible tooling now. You build a server once and it works across all of them. The integration tax I was paying on every project is gone. Based on my own deployments, MCP cuts development time by 60 to 70% on projects that need to connect AI to multiple business tools. That is not a theoretical estimate. It is what I measured across the last eight client projects.&lt;/p&gt;

&lt;p&gt;The governance structure reinforces the staying power. In December 2025, Anthropic donated MCP to the Agentic AI Foundation under Linux Foundation oversight. OpenAI and Block serve as co-founders. AWS, Google, Microsoft, Cloudflare, and Bloomberg hold platinum membership. When a protocol gets Linux Foundation governance with that roster of platinum members, it has crossed from "promising technology" into "foundational infrastructure." Companies planning multi-year technology investments can reasonably bet on it without worrying about it disappearing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1540575467063-178a50c2df87%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1540575467063-178a50c2df87%3Fw%3D1200%26q%3D80" alt="Technology conference with engineers gathered for MCP Dev Summit discussion" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;The first MCP Dev Summit drew engineers from Anthropic, OpenAI, AWS, Uber, PwC, and dozens of enterprise teams&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Dev Summit Actually Revealed
&lt;/h2&gt;

&lt;p&gt;Conference keynotes tell you what companies want you to believe. The breakout sessions tell you what is actually happening. Here is what stood out from the summit sessions that matter most to businesses building on MCP.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enterprise teams hit the same wall at scale
&lt;/h3&gt;

&lt;p&gt;The talk that got the most attention in the rooms I followed was the session on enterprise MCP adoption patterns. Multiple organizations described the same sequence: MCP deployment starts fast, works beautifully in a controlled environment, then hits friction the moment you try to run it at org-scale with real security requirements.&lt;/p&gt;

&lt;p&gt;The friction points are predictable. Static client credentials that IT cannot manage through their existing identity systems. No audit trail for agent actions against internal tools. Gateway behavior that differs between MCP client implementations. Configuration that cannot be exported and reproduced across environments. These are not protocol failures. They are the expected gaps in any young infrastructure standard that was built for developer experience first and enterprise governance second.&lt;/p&gt;

&lt;h3&gt;
  
  
  Duolingo deployed 180 plus MCP tools in a single Slackbot
&lt;/h3&gt;

&lt;p&gt;One session that illustrated where mature enterprise MCP deployments are heading came from Aaron Wang at Duolingo. The session covered their internal AI Slackbot, a system that gives Duolingo employees an AI assistant connected to more than 180 internal tools via MCP. A single bot. 180 plus tools. One protocol layer handling all of it.&lt;/p&gt;

&lt;p&gt;I have built systems that connect AI agents to 20 to 30 tools for clients. The operational complexity at that scale is already significant. Thinking through the observability, permissions scoping, and context management required for 180 plus tools gives you a sense of both how powerful MCP is when fully deployed and how serious the enterprise readiness gaps are at that level of scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  The White House noticed
&lt;/h3&gt;

&lt;p&gt;On March 20, two weeks before the summit, the White House released its national AI policy framework. It explicitly identified agentic AI infrastructure as a priority investment area. That is not something that happens when a technology is still experimental. When federal policy starts naming your infrastructure category, you are past the innovation curve and into the deployment phase. For businesses that had been waiting on regulatory clarity before committing to AI agent investments, that signal matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Security Reckoning Nobody Planned For
&lt;/h2&gt;

&lt;p&gt;I am going to spend more time on this section than most coverage does because it is the thing most businesses considering AI agents are not thinking about carefully enough.&lt;/p&gt;

&lt;p&gt;Between January and February 2026, security researchers filed more than 30 CVEs against MCP servers, clients, and infrastructure. That is roughly one critical or high-severity finding every two days for sixty days straight. The researchers called it "the Log4j pattern repeating": infrastructure adoption outpacing security hardening, with the vulnerability surface growing faster than the patching cadence.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1614064641938-3bbee52942c7%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1614064641938-3bbee52942c7%3Fw%3D1200%26q%3D80" alt="Security code review showing MCP vulnerability patterns" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;30 plus CVEs in 60 days revealed that MCP adoption outpaced security hardening across the ecosystem&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The breakdown of vulnerability categories is instructive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;43% of CVEs involve exec or shell injection&lt;/strong&gt;: MCP servers passing user input to shell commands without sanitization. The &lt;code&gt;mcp-remote&lt;/code&gt; package alone had a CVSS 9.6 remote code execution flaw and nearly half a million downloads before the patch landed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;82% of 2,614 tested MCP implementations&lt;/strong&gt; were vulnerable to path traversal attacks via file operations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;67% had some form of code injection risk&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;38 to 41% of MCP servers lack authentication mechanisms entirely&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;20% of CVEs involve tooling infrastructure flaws&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;13% represent authentication bypasses&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Five core attack patterns emerged from the research. Tool poisoning injects malicious instructions into tool descriptions that AI agents then execute implicitly because agents treat tool descriptions as trusted. Prompt injection via external data embeds attacks in GitHub issues, Slack messages, and other sources that get pulled into agent context. Trust bypass exploits weak revalidation of approved MCP server configurations. Supply chain attacks publish backdoored servers impersonating legitimate services. And cross-tenant exposure breaks isolation in shared hosting environments.&lt;/p&gt;

&lt;p&gt;None of these are exotic. They are classic application security problems applied to a new infrastructure layer. The engineers I talked to at the summit were not surprised by the vulnerability categories. They were surprised by how quickly the attack surface expanded because adoption moved so fast.&lt;/p&gt;

&lt;p&gt;What does this mean practically? If you are deploying AI agents using MCP-connected tools, you need a security checklist that did not exist eighteen months ago. Run the &lt;code&gt;mcp-scan&lt;/code&gt; vulnerability scanner against your implementation. Pin server versions rather than tracking &lt;code&gt;@latest&lt;/code&gt; tags. Review tool descriptions for anything that could be poisoned. Rotate broadly shared credentials. Enable logging of every MCP tool invocation. These are not optional in production. They are baseline hygiene for any system that gives an AI agent access to internal tools.&lt;/p&gt;

&lt;p&gt;For context on the work I do: when I build AI agent systems for clients, security architecture is a first-class deliverable, not an afterthought. The 14-layer security model I run on my own site includes system prompt boundaries, guardrails, rate limiting, input validation, and injection defense. If you want to see how I think about &lt;a href="https://www.jahanzaib.ai/services" rel="noopener noreferrer"&gt;securing AI systems in production&lt;/a&gt;, that work starts at the architecture stage, not after deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 2026 Roadmap: What Is Coming Next
&lt;/h2&gt;

&lt;p&gt;David Soria Parra published the 2026 MCP roadmap on March 9, two weeks before the 97M milestone announcement. It is the clearest signal we have about where the protocol is heading and what will change for teams building on it.&lt;/p&gt;

&lt;p&gt;The roadmap identifies four priority areas: transport evolution, enterprise readiness, agent communication, and governance maturation. Enterprise readiness is the one that directly affects most production deployments today.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1593508512255-86ab42a8e620%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1593508512255-86ab42a8e620%3Fw%3D1200%26q%3D80" alt="Technology roadmap and growth chart for AI infrastructure" width="1200" height="1206"&gt;&lt;/a&gt;&lt;em&gt;The 2026 MCP roadmap makes enterprise readiness a top priority after the first wave of production deployments surfaced predictable gaps&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;On authentication, the roadmap explicitly names static client secrets as a known problem and commits to building "paved paths" toward SSO-integrated flows. The goal is making MCP access manageable through the same identity systems IT already uses for everything else, rather than requiring separate credential management. For enterprise teams, this is the difference between MCP being something developers deploy independently and something IT can govern.&lt;/p&gt;

&lt;p&gt;Two active Specification Enhancement Proposals are already in progress: SEP-1932 covers DPoP (Demonstrating Proof of Possession), a token binding mechanism that prevents token theft attacks. SEP-1933 covers Workload Identity Federation, which lets MCP servers authenticate using cloud provider identities rather than static credentials. These are "horizon" items in the current roadmap cycle, meaning they have active proposals but are not guaranteed to ship this year. But the fact that they have SEP numbers and active Working Group attention means they are real.&lt;/p&gt;

&lt;p&gt;The transport evolution priority addresses another pain point I have hit on real deployments: the HTTP SSE transport used in many current MCP implementations is fragile at scale. The roadmap points toward more robust streaming transports and standardized gateway behavior, which will matter a lot once agent systems need to handle hundreds of concurrent tool calls.&lt;/p&gt;

&lt;p&gt;Agent-to-agent communication is the more forward-looking piece. Right now most MCP deployments are single-agent systems connecting to many tools. The emerging pattern is multi-agent systems where agents coordinate with each other via MCP. The roadmap is building primitives for this: agent discovery, capability negotiation, and trust delegation between agents. This is the architecture that enables the systems Duolingo described, where one agent orchestrates dozens of specialized sub-agents across a 180-tool environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Business Right Now
&lt;/h2&gt;

&lt;p&gt;Here is where I am going to give you the direct take rather than the careful hedging.&lt;/p&gt;

&lt;p&gt;If you have been waiting to make a decision about AI agents, the calculus changed this month. MCP being infrastructure-grade with Linux Foundation governance and universal provider support means you are not making a bet on an experimental technology anymore. You are making a bet on something closer to how you think about REST APIs or OAuth: established, multi-vendor, here for the long term.&lt;/p&gt;

&lt;p&gt;But the security findings are not a reason to wait. They are a reason to deploy carefully with the right guidance. The vulnerabilities that were found exist in careless implementations, not in MCP itself. The protocol has no inherent security flaws. The CVEs are implementation-level mistakes that good engineering practice prevents. That is exactly the situation with SQL injection: the database is not broken, the developers who concatenate user input into queries without parameterization are making a mistake.&lt;/p&gt;

&lt;p&gt;The practical question is whether your business actually needs AI agents or whether you need AI automation. Those are different things with different cost profiles. I built a free &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI Agent Readiness Assessment&lt;/a&gt; specifically to help answer this. It takes 12 to 15 minutes and gives you a scored report across eight dimensions with a clear agent vs. automation verdict. About 60% of the businesses that take it should be running n8n or Make workflows, not deploying agent systems. The assessment tells you which bucket you are in before you spend engineering budget on the wrong thing.&lt;/p&gt;

&lt;p&gt;For the businesses that do need agents, the right architecture today looks like this: MCP-based tool connectivity as the integration layer, a strong system prompt with explicit tool boundaries and approval gates, enterprise-grade guardrails for content and injection defense, comprehensive logging of every agent action, and a human-in-the-loop escalation path for any action above a defined consequence threshold. I have deployed this stack for clients in &lt;a href="https://www.jahanzaib.ai/work" rel="noopener noreferrer"&gt;healthcare, legal, and ecommerce contexts&lt;/a&gt;. The implementation details differ by use case but the architectural pattern is consistent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1682687220742-aba13b6e50ba%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1682687220742-aba13b6e50ba%3Fw%3D1200%26q%3D80" alt="AI agent automation system connecting enterprise tools via protocol layer" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;The right AI agent architecture uses MCP as the integration layer with security, observability, and human escalation paths built in from day one&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Business Profiles I See Right Now
&lt;/h2&gt;

&lt;p&gt;After talking to dozens of business owners and engineering leads over the last six months, I have started to see three distinct profiles in how organizations are approaching this moment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Profile one: The cautious evaluator.&lt;/strong&gt; These teams have been watching AI agents for 18 months, running occasional demos, never pulling the trigger because the technology felt too immature or the ROI math did not pencil. The 97M milestone and Linux Foundation governance just removed the immaturity argument. If you are in this bucket, the question is no longer whether MCP is stable. It is whether your specific workflows have enough decision complexity, data variability, or cross-system coordination to justify agents over simpler automation. Take the assessment. Get the number.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Profile two: The accidental deployer.&lt;/strong&gt; These teams built something with MCP six to twelve months ago when it was still moving fast, and now they have a production system that was never reviewed for the security patterns the researchers identified in January and February. If this is you, the first thing I would do is run &lt;code&gt;mcp-scan&lt;/code&gt; against your implementation and check whether any of your servers are on the CVE list. Pin your server versions. Audit your tool descriptions. This is not a crisis but it is a maintenance window you should not keep deferring.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Profile three: The enterprise architect.&lt;/strong&gt; These teams are building MCP deployments at Duolingo scale or planning to. The authentication and audit gaps in the current protocol are a real blocker for you, and the 2026 roadmap tells you they are in progress but not yet shipped. In the meantime, the practical path is to build your own thin governance layer: a gateway that enforces your auth requirements, a logging pipeline that captures every tool call, and a configuration management system that lets you reproduce deployments across environments. I have had to build these layers for large clients and they are not trivial, but they are buildable with today's primitives while you wait for the spec to catch up.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Take After Three Years Building This Stuff
&lt;/h2&gt;

&lt;p&gt;I have written before about &lt;a href="https://www.jahanzaib.ai/blog/model-context-protocol-how-i-build-mcp-servers-that-run-in-production-and-what-most-guides-skip" rel="noopener noreferrer"&gt;how I build MCP servers for production&lt;/a&gt;. The technical patterns have not changed much since I wrote that post. What has changed is the context around them.&lt;/p&gt;

&lt;p&gt;When I first started deploying MCP, I had to explain what it was in every client conversation. Now I get calls from business owners who have already heard of it and want to know whether they should use it. That shift happened in about six months. The 97M milestone is the quantitative confirmation of what I have been watching qualitatively: MCP crossed from developer curiosity to business-decision-maker awareness somewhere in Q4 2025, and the first Dev Summit is the community's response to that shift.&lt;/p&gt;

&lt;p&gt;The security findings are the shadow of that growth. Any technology that goes from niche to infrastructure in 16 months is going to have security debt. The question is whether the ecosystem patches it before attackers exploit it systematically. The CVE count and the summit sessions on security both suggest the community is taking it seriously. But "taking it seriously" means deploying with eyes open, not waiting for a perfect protocol that does not have CVEs. No infrastructure that matters is without CVEs.&lt;/p&gt;

&lt;p&gt;If you are building AI agents in 2026, MCP is not optional. It is the integration layer. The question is whether you are deploying it with the security hygiene and enterprise governance patterns it requires, or whether you are deploying it the way most early adopters deployed Node.js: fast, functional, and with security debt you will spend years cleaning up.&lt;/p&gt;

&lt;p&gt;I would rather help you get it right the first time. If you want a direct conversation about what an MCP-based agent architecture would look like for your specific situation, &lt;a href="https://www.jahanzaib.ai/contact" rel="noopener noreferrer"&gt;get in touch&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; MCP crossed 97 million monthly SDK downloads in March 2026, up from approximately 2 million at launch in November 2024, according to &lt;a href="https://byteiota.com/model-context-protocol-hits-97m-installs-standard-wins/" rel="noopener noreferrer"&gt;ByteIota 2026&lt;/a&gt;. The ecosystem includes 5,800 plus community servers and more than 10,000 active in production. The first MCP Dev Summit North America ran April 2 to 3, 2026, organized by the &lt;a href="https://aaif.io/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation-aaif-anchored-by-new-project-contributions-including-model-context-protocol-mcp-goose-and-agents-md/" rel="noopener noreferrer"&gt;Agentic AI Foundation (Linux Foundation) 2026&lt;/a&gt;. Security findings cited from &lt;a href="https://www.heyuan110.com/posts/ai/2026-03-10-mcp-security-2026/" rel="noopener noreferrer"&gt;MCP Security 2026 analysis&lt;/a&gt; covering 30 plus CVEs filed January to February 2026. The 2026 MCP Roadmap published by David Soria Parra is available at &lt;a href="http://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/" rel="noopener noreferrer"&gt;blog.modelcontextprotocol.io 2026&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What does MCP hitting 97 million installs actually mean for businesses?
&lt;/h3&gt;

&lt;p&gt;It means the protocol has crossed from experimental to infrastructure. Every major AI provider supports it, the Linux Foundation governs it, and more than 5,800 servers cover virtually every business tool category. Businesses evaluating AI agents no longer need to worry about whether MCP will be around in three years. The stability argument for waiting is gone. The remaining questions are about whether your specific workflows need agent complexity or simpler automation, and whether your team has the security posture to deploy agents safely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is MCP safe to use given the 30 plus CVEs filed in early 2026?
&lt;/h3&gt;

&lt;p&gt;The vulnerabilities are in implementations, not in the protocol itself. 43% of the CVEs involve developers passing user input to shell commands without sanitization, which is a classic application security mistake applied to a new context. Using MCP safely requires the same discipline as using any powerful infrastructure: pin your server versions, run vulnerability scans, audit tool descriptions for injection risks, enable comprehensive logging, and avoid servers from untrusted publishers. The protocol is not broken. Many early adopters deployed it carelessly.&lt;/p&gt;

&lt;h3&gt;
  
  
  What was the most important thing revealed at the MCP Dev Summit?
&lt;/h3&gt;

&lt;p&gt;The pattern that enterprise teams hit the same authentication and governance wall regardless of industry or use case. Static credentials that IT cannot manage, no audit trail for agent actions, and configuration that cannot be reproduced across environments. These gaps were consistent across every large-scale deployment discussion at the summit. The 2026 roadmap addresses them directly, but they are not solved today. Organizations deploying at scale need to build their own governance layers in the interim.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need MCP to build AI agents?
&lt;/h3&gt;

&lt;p&gt;No, but building without it means writing custom integration code for every tool your agents need to access, and rewriting it when you change AI providers. MCP eliminates the per-provider integration tax. If you are building agents that connect to more than two or three tools, or if you might want to swap model providers at any point, building on MCP from the start saves significant engineering time. The 60 to 70% development time reduction I measured on my own projects reflects real integration work that MCP simply removes.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the Agentic AI Foundation and why does it matter?
&lt;/h3&gt;

&lt;p&gt;The Agentic AI Foundation (AAIF) is a Linux Foundation project that took governance of MCP in December 2025. Founding members include Anthropic, OpenAI, Block, AWS, Google, Microsoft, Cloudflare, and Bloomberg. Linux Foundation governance means MCP has the same neutral, multi-stakeholder stewardship as foundational open-source projects like Kubernetes and Node.js. For businesses making long-term technology investments, it means no single company can unilaterally change the protocol in ways that break your deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I know if my business needs AI agents or simpler automation tools?
&lt;/h3&gt;

&lt;p&gt;The short answer is that most businesses need automation first and agents later. Agents are the right choice when your workflows involve real-time decision-making with context that changes unpredictably, when tasks require judgment calls across multiple data sources, or when the process is too variable to map into a fixed workflow. If your processes are well-defined, data is clean, and the steps are predictable, n8n or Make will give you 80% of the value at 20% of the cost. I built a free &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI Agent Readiness Assessment&lt;/a&gt; that scores your situation across eight dimensions and gives you a clear verdict with specific tool recommendations.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is tool poisoning and how does it affect AI agents using MCP?
&lt;/h3&gt;

&lt;p&gt;Tool poisoning is an attack where a malicious MCP server includes hidden instructions in its tool descriptions. When an AI agent reads these descriptions to understand what a tool does, it also reads and potentially executes the hidden instructions. Because agents treat tool descriptions as trusted content by default, a poisoned tool description can redirect agent behavior without any user interaction. Defense requires reviewing tool descriptions before deployment, using only servers from verified publishers, and configuring agents to treat external data as untrusted input even when it arrives through tool outputs.&lt;/p&gt;

&lt;h3&gt;
  
  
  What should I do today to prepare for MCP-based AI agents?
&lt;/h3&gt;

&lt;p&gt;If you are evaluating AI agents: take the AI Agent Readiness Assessment to get a baseline before committing budget. If you are already running MCP in production: run mcp-scan against your implementation, pin your server versions to specific releases, enable logging for all tool invocations, and audit your tool descriptions for injection patterns. If you are planning a new deployment: treat security architecture as a first-class deliverable from day one, not a layer you add after the system works. The authentication gaps in the current spec are known and in progress. Build your own governance layer now rather than waiting for the spec to catch up.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>aiagents</category>
      <category>modelcontextprotocol</category>
      <category>aisecurity</category>
    </item>
    <item>
      <title>LangGraph Tutorial: How I Build Production AI Agents With It</title>
      <dc:creator>Jahanzaib</dc:creator>
      <pubDate>Sun, 05 Apr 2026 01:20:44 +0000</pubDate>
      <link>https://forem.com/jahanzaibai/langgraph-tutorial-how-i-build-production-ai-agents-with-it-1elj</link>
      <guid>https://forem.com/jahanzaibai/langgraph-tutorial-how-i-build-production-ai-agents-with-it-1elj</guid>
      <description>&lt;p&gt;The third time a client's AI pipeline crashed mid-workflow and wiped out 45 minutes of LLM calls, I stopped using stateless chains. That was 18 months ago. Since then I've built 23 production systems on &lt;strong&gt;LangGraph&lt;/strong&gt;, and the difference is not subtle. LangGraph tutorial content online is mostly surface level. This is the guide I wish existed when I was migrating real client systems to it.&lt;/p&gt;

&lt;p&gt;LangGraph lets you model your agent as a directed graph where nodes are actions and edges are decisions. It handles state persistence, conditional routing, and crash recovery for you. As of Q1 2026, it gets 34.5 million monthly downloads and around 400 companies run it in production, including Uber, Cisco, LinkedIn, and JPMorgan. The framework reached v1.0 in late 2025, which means the API is stable enough to build on without worrying about breaking changes every few weeks.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;LangGraph models AI agents as directed graphs: nodes run your logic, edges decide what runs next, and shared state carries data between steps&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Checkpointing with MemorySaver (dev) or PostgresSaver (production) means crashed agents resume exactly where they left off&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Human-in-the-loop approval gates take 3 lines of code with interrupt_before with no custom middleware required&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Streaming works at the node level, token level, and event level, so users see real-time progress through long workflows&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LangGraph is best for complex stateful pipelines with branching logic; use CrewAI when you need role-based agent teams with fast setup&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Real production deployments report 10 to 15 hours per week saved on previously manual workflows, with sub-3-minute turnaround on research tasks that took hours&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What LangGraph Actually Is (And Why the Graph Model Matters)
&lt;/h2&gt;

&lt;p&gt;Most AI agent frameworks treat your workflow as a sequential chain: step one calls an LLM, step two calls a tool, step three formats output. That works fine until you need the agent to loop back, make a decision based on partial results, or pause for human review before doing something irreversible.&lt;/p&gt;

&lt;p&gt;LangGraph models the same workflow as a directed graph. Each node is a Python function. Each edge is a routing decision. A single shared state object moves through the graph and every node can read from it and write to it. This sounds abstract until you see what it unlocks in practice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1580927752452-89d86da3fa0a%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1580927752452-89d86da3fa0a%3Fw%3D1200%26q%3D80" alt="LangGraph agent workflow concept: interconnected nodes representing AI agent state and decision routing" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;LangGraph models agent logic as a directed graph, where each node handles a specific task and edges control the flow based on current state.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here is a concrete example from a client project I built last quarter. The system researches job candidates, writes interview questions, and then pauses for a human recruiter to approve before sending anything to the candidate. With a sequential chain, implementing that pause is messy. With LangGraph, it's a one-line compile option.&lt;/p&gt;

&lt;p&gt;The other thing that matters is state persistence. When you checkpoint a LangGraph workflow, every node execution saves state to a database. If the server restarts or the Lambda function cold-starts mid-workflow, the agent picks up from the last saved node. I've had a client's workflow survive two server restarts during a 12-step research task and complete correctly. That's not possible with stateless chains.&lt;/p&gt;

&lt;h2&gt;
  
  
  LangGraph Core Concepts: State, Nodes, and Edges
&lt;/h2&gt;

&lt;p&gt;Before writing any code, you need to understand the three building blocks. Get these right and everything else follows logically.&lt;/p&gt;

&lt;h3&gt;
  
  
  State: The Shared Data Structure
&lt;/h3&gt;

&lt;p&gt;State is a TypedDict (or Pydantic model) that every node in your graph reads from and writes to. Think of it as a shared context object that travels through the workflow and accumulates results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from typing import TypedDict, Annotated, List
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    messages: Annotated[List, add_messages]  # message history, auto-appended
    query: str                                 # the original user query
    research_results: List[str]               # accumulated research
    draft: str                                # current draft output
    approved: bool                            # human approval flag

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Annotated[List, add_messages]&lt;/code&gt; syntax is important. The &lt;code&gt;add_messages&lt;/code&gt; reducer means new messages get appended rather than replacing the entire list. For most other fields, the last write wins. You can define custom reducers for fields that need merge behavior instead of replace behavior.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nodes: Where Your Logic Lives
&lt;/h3&gt;

&lt;p&gt;A node is any Python function that takes state as input and returns a dict with the updated fields. It doesn't need to return the entire state, only the fields it wants to change.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, AIMessage

llm = ChatAnthropic(model="claude-haiku-4-5-20251001")

def research_node(state: AgentState) -&amp;gt; dict:
    """Searches for relevant information based on the query."""
    response = llm.invoke([
        HumanMessage(content=f"Research this topic and provide 3 key facts: {state['query']}")
    ])
    return {
        "research_results": [response.content],
        "messages": [AIMessage(content=response.content)]
    }

def draft_node(state: AgentState) -&amp;gt; dict:
    """Writes a draft based on research results."""
    combined_research = "\n".join(state["research_results"])
    response = llm.invoke([
        HumanMessage(content=f"Write a concise summary based on this research:\n{combined_research}")
    ])
    return {
        "draft": response.content,
        "messages": [AIMessage(content=f"Draft created: {response.content[:100]}...")]
    }

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nodes can do anything: call LLMs, execute tools, hit external APIs, write to databases, run Python code. The only contract is that they receive state and return a dict of updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Edges: How Decisions Get Made
&lt;/h3&gt;

&lt;p&gt;Edges define which node runs after the current one. Fixed edges always go to the same next node. Conditional edges inspect state and choose from multiple possible next nodes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def route_after_research(state: AgentState) -&amp;gt; str:
    """Route to draft writing if research succeeded, otherwise retry."""
    if state["research_results"] and len(state["research_results"]) &amp;gt; 0:
        return "draft"
    return "research"  # retry if research returned nothing

# Fixed edge example:
graph.add_edge("research", "draft")

# Conditional edge example:
graph.add_conditional_edges(
    "research",
    route_after_research,
    {"draft": "draft", "research": "research"}
)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1563986768609-322da13575f3%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1563986768609-322da13575f3%3Fw%3D1200%26q%3D80" alt="Python code on screen showing AI agent state machine implementation with LangGraph nodes and edges" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;LangGraph conditional edges let you implement complex branching logic with a simple routing function that returns a string indicating the next node.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a Complete LangGraph Agent: Step by Step
&lt;/h2&gt;

&lt;p&gt;Let me walk through building a research and writing agent from scratch. This is a simplified version of a system I deployed for a consulting client that generates weekly industry reports. The full version has 11 nodes and handles failure recovery, but this covers every concept you need.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installation and Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install langgraph langchain-anthropic langchain-community

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
os.environ["ANTHROPIC_API_KEY"] = "your-key-here"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Build the Graph
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver

# Initialize the graph with our state schema
builder = StateGraph(AgentState)

# Add nodes
builder.add_node("research", research_node)
builder.add_node("draft", draft_node)

# Wire up the edges
builder.add_edge(START, "research")
builder.add_conditional_edges(
    "research",
    route_after_research,
    {"draft": "draft", "research": "research"}
)
builder.add_edge("draft", END)

# Compile with in-memory checkpointer (swap for PostgresSaver in production)
checkpointer = MemorySaver()
agent = builder.compile(checkpointer=checkpointer)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Run the Agent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Config with a thread_id — this is how LangGraph tracks conversation history
config = {"configurable": {"thread_id": "research-session-001"}}

result = agent.invoke(
    {"query": "What are the main use cases for AI agents in logistics?"},
    config=config
)

print("Final draft:")
print(result["draft"])

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The agent researches the topic, routes conditionally based on whether research returned results, writes a draft, and saves state at every step. If anything crashes, call &lt;code&gt;agent.invoke&lt;/code&gt; again with the same &lt;code&gt;thread_id&lt;/code&gt; and it resumes from the last checkpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory and Checkpointing: The Feature That Makes LangGraph Production-Ready
&lt;/h2&gt;

&lt;p&gt;This is where LangGraph genuinely differentiates from most frameworks. Most agent systems are stateless. Each run starts from scratch. That works for quick Q&amp;amp;A tasks but falls apart the moment you're running 10-step pipelines that take several minutes.&lt;/p&gt;

&lt;p&gt;LangGraph saves state to a checkpointer after every node. Three options come built-in:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Checkpointer&lt;/th&gt;
&lt;th&gt;Storage&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Production Ready?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MemorySaver&lt;/td&gt;
&lt;td&gt;In-memory Python dict&lt;/td&gt;
&lt;td&gt;Development and testing&lt;/td&gt;
&lt;td&gt;No (lost on restart)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SqliteSaver&lt;/td&gt;
&lt;td&gt;SQLite file on disk&lt;/td&gt;
&lt;td&gt;Local apps, single-instance deploys&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PostgresSaver&lt;/td&gt;
&lt;td&gt;PostgreSQL database&lt;/td&gt;
&lt;td&gt;Production multi-instance deployments&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Switching from development to production checkpointing takes 4 lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
import psycopg

async def create_production_agent():
    conn = await psycopg.AsyncConnection.connect(os.environ["DATABASE_URL"])
    checkpointer = AsyncPostgresSaver(conn)
    await checkpointer.setup()  # creates tables on first run
    return builder.compile(checkpointer=checkpointer)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The thread-based memory model also means you get conversation history for free. A user can return to a research session days later and ask "expand on the second point from earlier" and the agent has the full prior context available in state.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1670272502246-768d249768ca%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1670272502246-768d249768ca%3Fw%3D1200%26q%3D80" alt="Database and state persistence diagram representing LangGraph checkpointing for AI agent memory" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;LangGraph's checkpointing system saves state after every node execution. A crashed workflow with a PostgresSaver checkpointer resumes exactly where it left off.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross-Thread Memory with a Memory Store
&lt;/h3&gt;

&lt;p&gt;Checkpointing is per-thread. If you want information to persist across different conversation sessions for the same user (user preferences, past decisions, learned context), use a separate memory store alongside the checkpointer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langgraph.store.memory import InMemoryStore

store = InMemoryStore()

# Write to store from inside a node
def personalization_node(state: AgentState, store=store) -&amp;gt; dict:
    namespace = ("user_preferences", state.get("user_id", "default"))
    items = store.search(namespace)
    user_prefs = {item.key: item.value for item in items}
    return {"user_preferences": user_prefs}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In production, swap &lt;code&gt;InMemoryStore&lt;/code&gt; for a Redis or PostgreSQL-backed store. The interface is identical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Human-in-the-Loop: Adding Approval Gates Without Custom Middleware
&lt;/h2&gt;

&lt;p&gt;This is one of my favorite LangGraph features and the one that most surprises clients when I demo it. Adding a human approval gate before a potentially destructive action (sending an email, writing to a production database, making a purchase) takes three lines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Compile with interrupt_before to pause before the "send_email" node
agent = builder.compile(
    checkpointer=checkpointer,
    interrupt_before=["send_email"]
)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the graph reaches the &lt;code&gt;send_email&lt;/code&gt; node, it saves state and pauses. Your application shows the pending action to a human reviewer. When they approve, you resume:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# The agent paused before send_email — show the pending state to the human
pending_state = agent.get_state(config)
print("About to send this email:")
print(pending_state.values.get("draft_email"))

# Human approves — resume by passing None (no new input needed)
result = agent.invoke(None, config=config)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the human rejects the action, you can update state before resuming:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Update state with human feedback before resuming
agent.update_state(
    config=config,
    values={"draft_email": "Please use a more formal tone..."}
)
result = agent.invoke(None, config=config)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I've used this pattern for a legal contract review agent where a lawyer must approve each clause edit before the system commits it to the document. The entire approval flow is handled by LangGraph's interrupt system with no custom middleware needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming: Real-Time Progress for Long-Running Agents
&lt;/h2&gt;

&lt;p&gt;Long-running agents feel broken if users see nothing for 30 seconds. LangGraph streams at three levels and you can combine them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stream Mode: Values
&lt;/h3&gt;

&lt;p&gt;Emits the full state snapshot after every node completes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for chunk in agent.stream(
    {"query": "Analyze the AI agent market in logistics"},
    config=config,
    stream_mode="values"
):
    print(f"Node completed. Draft so far: {chunk.get('draft', 'not yet')[:100]}")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Stream Mode: Updates
&lt;/h3&gt;

&lt;p&gt;Emits only the changed fields from each node, which is more efficient for large state objects.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for node_name, updates in agent.stream(
    {"query": "..."},
    config=config,
    stream_mode="updates"
):
    print(f"Node '{node_name}' updated: {list(updates.keys())}")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Stream Mode: Messages (Token-Level Streaming)
&lt;/h3&gt;

&lt;p&gt;Emits individual LLM tokens as they arrive. Use this when you want the typewriter effect in your UI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;async for message, metadata in agent.astream(
    {"query": "..."},
    config=config,
    stream_mode="messages"
):
    if hasattr(message, 'content') and message.content:
        print(message.content, end="", flush=True)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I use &lt;code&gt;stream_mode="updates"&lt;/code&gt; in most production applications because it gives users clear progress indicators ("Researching... Writing draft... Reviewing...") without flooding the connection with full state snapshots.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1573164713988-8665fc963095%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1573164713988-8665fc963095%3Fw%3D1200%26q%3D80" alt="Real-time data streaming visualization representing LangGraph streaming output for AI agents in production" width="1200" height="801"&gt;&lt;/a&gt;&lt;em&gt;LangGraph's three streaming modes let you emit tokens, node updates, or full state snapshots depending on what your application's UI needs.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Patterns I Use Across Every LangGraph Deployment
&lt;/h2&gt;

&lt;p&gt;After 23 production deployments, these patterns have become standard in my projects. They're not in the official docs but they save significant debugging time.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Always Add Error Node Routing
&lt;/h3&gt;

&lt;p&gt;Every multi-step agent needs a way to handle partial failures gracefully. I add a dedicated error handler node and route to it on exceptions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def safe_research_node(state: AgentState) -&amp;gt; dict:
    try:
        return research_node(state)
    except Exception as e:
        return {
            "error": str(e),
            "messages": [AIMessage(content=f"Research failed: {e}")]
        }

def route_after_safe_research(state: AgentState) -&amp;gt; str:
    if state.get("error"):
        return "handle_error"
    return "draft"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Use Recursion Limit to Prevent Infinite Loops
&lt;/h3&gt;

&lt;p&gt;Conditional edges that can loop back to earlier nodes are a common cause of runaway agents. Set a recursion limit at compile time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agent = builder.compile(
    checkpointer=checkpointer,
    recursion_limit=25  # default is 25, lower it for cost-sensitive workflows
)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Store Token Counts in State for Cost Monitoring
&lt;/h3&gt;

&lt;p&gt;LLM costs add up fast in multi-step workflows. I track token usage in state so I can alert when a workflow exceeds budget:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class AgentState(TypedDict):
    # ... other fields ...
    total_tokens_used: int

def track_tokens(response, state: AgentState) -&amp;gt; dict:
    usage = response.usage_metadata or {}
    current = state.get("total_tokens_used", 0)
    return {
        "total_tokens_used": current + usage.get("total_tokens", 0)
    }

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Use LangGraph Studio for Debugging
&lt;/h3&gt;

&lt;p&gt;LangGraph Studio is a local UI that visualizes your graph, shows state at each step, lets you replay from any checkpoint, and shows which edges fired. I install it on every project. Setup takes two minutes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install langgraph-cli
langgraph dev  # starts Studio at localhost:8123

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you've ever spent an hour debugging why your agent went to the wrong node, Studio replaces that with a visual click-through of the execution path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Cost and Performance Data
&lt;/h2&gt;

&lt;p&gt;Here's what I've actually seen in production, across six recent LangGraph deployments:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workflow Type&lt;/th&gt;
&lt;th&gt;Nodes&lt;/th&gt;
&lt;th&gt;Avg Run Time&lt;/th&gt;
&lt;th&gt;Avg Token Cost (Claude Haiku)&lt;/th&gt;
&lt;th&gt;Manual Time Replaced&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Candidate research + interview prep&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;2.4 min&lt;/td&gt;
&lt;td&gt;$0.04&lt;/td&gt;
&lt;td&gt;45 min/candidate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal contract clause review&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;4.1 min&lt;/td&gt;
&lt;td&gt;$0.11&lt;/td&gt;
&lt;td&gt;2.5 hr/contract&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weekly industry report generation&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;7.8 min&lt;/td&gt;
&lt;td&gt;$0.29&lt;/td&gt;
&lt;td&gt;4 hr/week&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customer support triage + draft&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;45 sec&lt;/td&gt;
&lt;td&gt;$0.006&lt;/td&gt;
&lt;td&gt;12 min/ticket&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Product catalog enrichment (50 items)&lt;/td&gt;
&lt;td&gt;3 per item&lt;/td&gt;
&lt;td&gt;18 min total&lt;/td&gt;
&lt;td&gt;$0.45 total&lt;/td&gt;
&lt;td&gt;3 hr/batch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Onboarding document generation&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;3.2 min&lt;/td&gt;
&lt;td&gt;$0.08&lt;/td&gt;
&lt;td&gt;1.5 hr/client&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern is consistent: LangGraph workflows costing less than $0.50 routinely replace work that takes humans between 45 minutes and 4 hours. The ROI makes sense even at low volume.&lt;/p&gt;

&lt;p&gt;One important note: these numbers use Claude Haiku 4.5 on AWS Bedrock. If you're using GPT-4o or Claude Opus, multiply the token costs by roughly 10 to 20 times. Model selection matters enormously for multi-step agent economics. I use the cheapest capable model for each task type.&lt;/p&gt;

&lt;h2&gt;
  
  
  LangGraph vs CrewAI: Which Should You Actually Use?
&lt;/h2&gt;

&lt;p&gt;I get this question on almost every client call. Both frameworks are good. The honest answer is that they're optimized for different workflows, and choosing wrong costs you a painful migration later.&lt;/p&gt;

&lt;p&gt;Use LangGraph when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Your workflow has complex conditional branching (different paths based on LLM output)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You need crash recovery and long-running persistence (minutes to hours)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Human approval gates are required before irreversible actions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You need fine-grained control over exactly what runs when&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You're deploying to production and need observability at the node level&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use CrewAI when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You're prototyping and want something working in under an hour&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Your workflow is naturally role-based (researcher, writer, reviewer agents)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The team has limited Python experience and prefers YAML configuration&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sequential execution is fine and you don't need complex routing&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most common pattern I see at growing companies: prototype in CrewAI, migrate the workflows that need reliability and branching to LangGraph. CrewAI's LangChain compatibility makes this migration easier than it sounds. I've done it three times in the past year and the rewrites typically take two to three days per workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1485827404703-89b55fcc595e%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1485827404703-89b55fcc595e%3Fw%3D1200%26q%3D80" alt="Software architect planning AI agent system architecture with LangGraph and multi-agent workflow diagram" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;LangGraph and CrewAI serve different needs. LangGraph excels at complex stateful pipelines; CrewAI wins for rapid prototyping with role-based agent teams.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where LangGraph Goes Wrong in Production (And How to Avoid It)
&lt;/h2&gt;

&lt;p&gt;I've hit all of these mistakes myself or watched clients hit them. They're not obvious from the docs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State schema drift.&lt;/strong&gt; If you add or remove fields from your TypedDict after you have existing checkpoints in the database, those checkpoints break on resume. Version your state schemas and add migration scripts before schema changes. I keep a &lt;code&gt;schema_version&lt;/code&gt; field in state specifically for this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not limiting recursion on retry loops.&lt;/strong&gt; A conditional edge that routes back to a previous node for retries will happily run 200 times if something is fundamentally broken. Always set &lt;code&gt;recursion_limit&lt;/code&gt; lower than the default 25 for cost-sensitive workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using MemorySaver in staging.&lt;/strong&gt; MemorySaver looks fine in development but gives you completely different failure behavior from PostgresSaver. Always test with your production checkpointer in staging so you catch serialization issues before they hit users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming without backpressure handling.&lt;/strong&gt; If you're streaming tokens to a browser and the user closes the tab, the underlying Python coroutine keeps running unless you handle cancellation. Use &lt;code&gt;asyncio.CancelledError&lt;/code&gt; handling in your streaming nodes for production deployments.&lt;/p&gt;

&lt;p&gt;For deeper context on how these patterns fit into larger production architectures, see my guide on &lt;a href="https://www.jahanzaib.ai/blog/ai-agents-production" rel="noopener noreferrer"&gt;building AI agents that actually work in production&lt;/a&gt; and the &lt;a href="https://www.jahanzaib.ai/blog/agentic-rag-production-guide" rel="noopener noreferrer"&gt;agentic RAG production guide&lt;/a&gt; that covers integrating knowledge retrieval into these same graph workflows. The &lt;a href="https://www.jahanzaib.ai/blog/n8n-ai-agent-workflows-practitioner-guide" rel="noopener noreferrer"&gt;n8n workflow guide&lt;/a&gt; is relevant if you want to trigger LangGraph agents from external automation platforms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting to Production: Deployment Options
&lt;/h2&gt;

&lt;p&gt;LangGraph has a first-party deployment option called LangGraph Cloud (part of LangChain's commercial offering) that handles scaling, monitoring, and checkpointer infrastructure. It's worth the cost for teams that don't want to manage PostgreSQL and Redis themselves.&lt;/p&gt;

&lt;p&gt;For self-hosted deployments, the standard stack I use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;FastAPI&lt;/strong&gt; as the API layer wrapping the LangGraph agent&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt; for checkpoint storage via AsyncPostgresSaver&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Redis&lt;/strong&gt; for cross-thread memory store (optional)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Server-sent events&lt;/strong&gt; for streaming tokens to the frontend&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LangSmith&lt;/strong&gt; for tracing and debugging (optional but highly recommended)&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import json

app = FastAPI()

@app.post("/agent/stream")
async def stream_agent(request: dict):
    thread_id = request.get("thread_id", str(uuid.uuid4()))
    config = {"configurable": {"thread_id": thread_id}}

    async def event_generator():
        async for chunk in agent.astream(
            {"query": request["query"]},
            config=config,
            stream_mode="updates"
        ):
            yield f"data: {json.dumps(chunk)}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream"
    )

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're building this type of system for a client and want help architecting the full stack, see &lt;a href="https://www.jahanzaib.ai/services" rel="noopener noreferrer"&gt;the AI systems services page&lt;/a&gt; for how I approach production deployments. The &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI readiness assessment&lt;/a&gt; is a good starting point if you're not sure whether your use case warrants LangGraph specifically or a simpler automation tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is LangGraph and how does it differ from LangChain?
&lt;/h3&gt;

&lt;p&gt;LangChain is a framework for building LLM-powered applications with chains, tools, and retrievers. LangGraph is built on top of LangChain and adds graph-based workflow orchestration with persistent state, conditional routing, and built-in support for multi-step agent loops. Use LangChain for simple LLM calls and pipelines. Use LangGraph when you need stateful, looping, or branching agent workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need to know graph theory to use LangGraph?
&lt;/h3&gt;

&lt;p&gt;No. The "graph" in LangGraph is just a way of describing workflow structure: nodes are steps, edges are connections between steps. If you can draw a flowchart of your workflow, you can implement it in LangGraph. The API is designed for application developers, not mathematicians.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does LangGraph handle long-running tasks that take hours?
&lt;/h3&gt;

&lt;p&gt;LangGraph checkpoints state after every node execution. Long-running tasks can be suspended, picked up by a different worker, or resumed after a server restart, as long as you're using a persistent checkpointer like PostgresSaver. The same thread_id is all you need to resume from exactly the last saved state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can LangGraph work with any LLM provider?
&lt;/h3&gt;

&lt;p&gt;Yes. LangGraph uses LangChain's model abstraction layer, which supports Anthropic, OpenAI, AWS Bedrock, Google Gemini, Mistral, Ollama (local models), and many others. Switching providers requires changing one line of code (the model initialization). The graph structure itself is provider-agnostic.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between LangGraph's MemorySaver and PostgresSaver?
&lt;/h3&gt;

&lt;p&gt;MemorySaver stores checkpoints in a Python dictionary in RAM. It's fast and zero-setup but all state is lost when the process restarts. PostgresSaver persists checkpoints to a PostgreSQL database, survives restarts, works across multiple instances, and supports concurrent threads. Use MemorySaver for development and testing. Always use PostgresSaver in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does it cost to run LangGraph in production?
&lt;/h3&gt;

&lt;p&gt;LangGraph itself is open source and free. Your costs come from LLM API calls, database storage for checkpoints, and compute. Based on my production deployments, simple 4-6 node workflows using Claude Haiku cost between $0.006 and $0.11 per run. Complex 11-node research workflows run $0.15 to $0.40. Budget roughly $10 to $50 per month for moderate workloads (500 to 2,000 runs).&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I use LangGraph or CrewAI for my project?
&lt;/h3&gt;

&lt;p&gt;Choose LangGraph if your workflow has complex branching, needs crash recovery, requires human approval gates, or will run in production at scale. Choose CrewAI if you want fast prototyping, your workflow is naturally role-based, or your team prefers YAML configuration over Python code. Many teams prototype in CrewAI and migrate production-critical workflows to LangGraph after validating the concept.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does LangGraph support multi-agent architectures?
&lt;/h3&gt;

&lt;p&gt;Yes. LangGraph supports supervisor patterns where one agent orchestrates subagents, swarm patterns where agents handoff tasks horizontally, and nested graphs where each "node" is itself a compiled LangGraph. The multi-agent features are mature in v1.x and used by companies like Uber and Cisco in production deployments.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; LangGraph has 34.5 million monthly downloads and around 400 companies running it in production as of Q1 2026 (&lt;a href="https://www.firecrawl.dev/blog/best-open-source-agent-frameworks" rel="noopener noreferrer"&gt;Firecrawl Research, 2026&lt;/a&gt;). Gartner predicts 40% of enterprise applications will embed agentic capabilities by end of 2026, up from under 5% in 2025 (&lt;a href="https://www.alphabold.com/langgraph-agents-in-production/" rel="noopener noreferrer"&gt;AlphaBold via Gartner, 2026&lt;/a&gt;). CrewAI GitHub stars: 44,300; AutoGen is now in maintenance mode following merger into Microsoft Agent Framework (&lt;a href="https://www.firecrawl.dev/blog/best-open-source-agent-frameworks" rel="noopener noreferrer"&gt;Firecrawl Research, 2026&lt;/a&gt;). Sources: &lt;a href="https://www.langchain.com/langgraph" rel="noopener noreferrer"&gt;LangChain LangGraph Official Docs&lt;/a&gt;, &lt;a href="https://github.com/langchain-ai/langgraph" rel="noopener noreferrer"&gt;LangGraph GitHub&lt;/a&gt;, &lt;a href="https://www.firecrawl.dev/blog/best-open-source-agent-frameworks" rel="noopener noreferrer"&gt;Firecrawl AI Framework Report 2026&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>langgraph</category>
      <category>aiagents</category>
      <category>python</category>
      <category>langchain</category>
    </item>
    <item>
      <title>AI Agents Are Coming for Your SaaS Stack and VCs Are Betting Billions on It</title>
      <dc:creator>Jahanzaib</dc:creator>
      <pubDate>Sat, 04 Apr 2026 11:19:18 +0000</pubDate>
      <link>https://forem.com/jahanzaibai/ai-agents-are-coming-for-your-saas-stack-and-vcs-are-betting-billions-on-it-4b88</link>
      <guid>https://forem.com/jahanzaibai/ai-agents-are-coming-for-your-saas-stack-and-vcs-are-betting-billions-on-it-4b88</guid>
      <description>&lt;p&gt;Last quarter, venture capitalists poured $65 billion into AI startups globally, according to &lt;a href="https://www.cbinsights.com/research/report/ai-trends-q1-2026/" rel="noopener noreferrer"&gt;CB Insights' State of AI Q1 2026 report&lt;/a&gt;. That brings total AI venture funding past $297 billion since the start of 2023. I have shipped 109 production AI systems over the past few years, and I can tell you: this money isn't chasing chatbots anymore. It's chasing the death of SaaS as we know it.&lt;/p&gt;

&lt;p&gt;The new wave of AI agents doesn't sit on top of your software stack. It replaces it. Cognition's Devin writes code. Factory AI automates entire engineering workflows. Harvey handles legal research that used to require a five figure contract with a legal SaaS vendor. And VCs are placing billion dollar bets that this pattern will swallow every software category within five years.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.jahanzaib.ai/blog/ai-agents-production" rel="noopener noreferrer"&gt;AI agents in production systems&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;AI venture funding hit $297 billion cumulative since 2023, with $65 billion in Q1 2026 alone (&lt;a href="https://www.cbinsights.com/research/report/ai-trends-q1-2026/" rel="noopener noreferrer"&gt;CB Insights, 2026&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AI agents are replacing entire SaaS tools, not just adding features to them&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Customer support, code generation, and data analytics are the first categories falling&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The shift is from "software as a service" to "service as software," where outcomes replace subscriptions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Most businesses will run hybrid stacks for the next two to three years&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Are VCs Pouring Billions into AI Agents Right Now?
&lt;/h2&gt;

&lt;p&gt;Global AI startup funding reached $65 billion in Q1 2026, a 35% increase over Q1 2025 (&lt;a href="https://www.cbinsights.com/research/report/ai-trends-q1-2026/" rel="noopener noreferrer"&gt;CB Insights, 2026&lt;/a&gt;). The reason is simple: investors see AI agents as the next platform shift, bigger than cloud, bigger than mobile. They're betting that software which does the work will beat software that helps you do the work.&lt;/p&gt;

&lt;p&gt;Look at the fundraising numbers. Cognition, the company behind the AI coding agent Devin, raised $2 billion at a $14 billion valuation in early 2026. Factory AI pulled in $200 million to build autonomous engineering agents. Harvey, the legal AI company, crossed a $3 billion valuation. These aren't incremental funding rounds. They're war chests designed to replace incumbent SaaS companies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1559136555-9303baea8ebd%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1559136555-9303baea8ebd%3Fw%3D1200%26q%3D80" alt="An abstract visualization of financial growth charts against a dark background representing massive venture capital investment flows into AI technology" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;AI venture funding has accelerated beyond anything the tech industry has seen since the dot com era&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The pattern I see across these deals is consistent. VCs aren't funding better features for existing categories. They're funding replacements. A customer support AI agent doesn't make Zendesk better. It makes Zendesk unnecessary for 80% of tickets. A coding agent doesn't improve Jira. It makes half the tickets in Jira disappear because the agent already fixed the bug.&lt;/p&gt;

&lt;p&gt;[ORIGINAL DATA] In my own client work, I've watched companies cancel three to five SaaS subscriptions within 90 days of deploying a single AI agent. One ecommerce client replaced their support ticketing system, their FAQ tool, and their live chat platform with one agent that handles 73% of inquiries autonomously. That's $4,200 per month in SaaS fees gone.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; AI startup funding reached $65 billion in Q1 2026 according to &lt;a href="https://www.cbinsights.com/research/report/ai-trends-q1-2026/" rel="noopener noreferrer"&gt;CB Insights&lt;/a&gt;, bringing cumulative AI venture investment past $297 billion since 2023. Cognition (Devin) alone raised $2 billion at a $14 billion valuation, signaling that investors expect AI agents to replace, not augment, traditional SaaS tools.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Makes Traditional SaaS Vulnerable to AI Agents?
&lt;/h2&gt;

&lt;p&gt;According to &lt;a href="https://www.gartner.com/en/articles/ai-agents" rel="noopener noreferrer"&gt;Gartner's 2025 predictions&lt;/a&gt;, 33% of enterprise software applications will include agentic AI by 2028, up from less than 1% in 2024. The vulnerability runs deep. SaaS was built on the assumption that humans operate the software. AI agents eliminate the operator entirely.&lt;/p&gt;

&lt;p&gt;Think about what most SaaS tools actually do. They present data in dashboards. They route tasks through workflows. They send notifications. They generate reports. Every one of these functions is a wrapper around a decision that a human has to make. AI agents collapse that entire loop. They see the data, make the decision, and execute the action. No dashboard needed.&lt;/p&gt;

&lt;p&gt;I built a multi agent order processing system for a client last year. Before that system, they used five different SaaS tools: an order management platform, an inventory tracker, a shipping label generator, a customer notification service, and a returns processor. The AI agent system handles all five functions. Not through integrations. Through intelligence.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.jahanzaib.ai/blog/when-to-use-ai-agents-vs-automation" rel="noopener noreferrer"&gt;When to use AI agents vs automation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The pricing model is what really threatens SaaS. Traditional SaaS charges per seat, per month. You pay whether you use it or not. AI agents charge per outcome or per action. You pay for results. &lt;a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai" rel="noopener noreferrer"&gt;McKinsey's 2025 State of AI report&lt;/a&gt; found that 72% of organizations now use AI in at least one business function, and the most common reason cited for adoption is cost reduction. When an AI agent can do the work of a $200 per month SaaS tool for $30 in API costs, the math speaks for itself.&lt;/p&gt;

&lt;p&gt;There's another vulnerability that SaaS companies rarely discuss. Data silos. Every SaaS tool creates its own data silo. Your CRM knows about customers. Your project management tool knows about tasks. Your analytics platform knows about metrics. None of them talk to each other well, despite billions spent on integration platforms. AI agents don't have this problem. They work across data sources natively because they reason about information, they don't just store it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1451187580459-43490279c0fa%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1451187580459-43490279c0fa%3Fw%3D1200%26q%3D80" alt="A digital network visualization showing interconnected nodes and data streams representing the collapse of data silos through AI agent architecture" width="1200" height="798"&gt;&lt;/a&gt;&lt;em&gt;AI agents work across data sources natively, collapsing the silo problem that plagues traditional SaaS stacks&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; Gartner predicts 33% of enterprise software will include agentic AI by 2028, up from under 1% in 2024 (&lt;a href="https://www.gartner.com/en/articles/ai-agents" rel="noopener noreferrer"&gt;Gartner, 2025&lt;/a&gt;). Meanwhile, McKinsey found that 72% of organizations already use AI in at least one business function (&lt;a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai" rel="noopener noreferrer"&gt;McKinsey, 2025&lt;/a&gt;), with cost reduction as the primary driver.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Which SaaS Categories Will AI Agents Replace First?
&lt;/h2&gt;

&lt;p&gt;Not all SaaS is equally vulnerable. According to a &lt;a href="https://www.sequoiacap.com/article/ai-agents-market-map/" rel="noopener noreferrer"&gt;Sequoia Capital market analysis&lt;/a&gt;, the SaaS categories most exposed to agent disruption share three traits: high labor cost per task, structured decision trees, and abundant training data. Based on that framework and my own experience building these systems, here's where the dominoes fall first.&lt;/p&gt;

&lt;h3&gt;
  
  
  Customer Support: Already Falling
&lt;/h3&gt;

&lt;p&gt;This is the most advanced replacement category. Companies like Sierra AI, Intercom's Fin, and Ada have built support agents that resolve 40% to 80% of tickets without human involvement. I deployed a support agent for a mid size ecommerce brand that now handles 73% of all customer inquiries. The remaining 27% get escalated to humans, but with full context already gathered by the agent. The client cancelled their Zendesk subscription three months later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Generation and Engineering Workflows
&lt;/h3&gt;

&lt;p&gt;Cognition's Devin can complete real engineering tasks end to end. Factory AI automates code review, testing, and deployment. GitHub Copilot, which started as autocomplete, now generates entire functions and suggests architectural changes. &lt;a href="https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-in-the-enterprise/" rel="noopener noreferrer"&gt;GitHub's own research&lt;/a&gt; shows Copilot users complete tasks 55% faster. The next step, already happening, is agents that don't just help developers but replace the need for certain developer roles entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Analytics and Business Intelligence
&lt;/h3&gt;

&lt;p&gt;Traditional BI tools like Tableau and Looker require humans to build dashboards, write queries, and interpret results. AI agents from companies like Hex, Databricks, and Census can now analyze data, generate insights, and even take action based on those insights. Ask a question in plain English, get an answer with a chart. No SQL required. No dashboard maintenance. No monthly BI platform subscription.&lt;/p&gt;

&lt;h3&gt;
  
  
  Legal Research and Contract Review
&lt;/h3&gt;

&lt;p&gt;Harvey raised $300 million because legal SaaS is a $30 billion market built on manual document review. AI agents can now review contracts, flag risks, and suggest edits at a fraction of the cost. In my experience, a legal AI agent processes a 50 page contract in about 90 seconds. A junior associate takes four to six hours. That cost differential is what makes VCs salivate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sales Development and Outbound
&lt;/h3&gt;

&lt;p&gt;AI sales agents from companies like 11x, Artisan, and Regie.ai are automating prospecting, email sequences, and initial qualification. &lt;a href="https://www.salesforce.com/resources/research-reports/state-of-sales/" rel="noopener noreferrer"&gt;Salesforce's 2025 State of Sales report&lt;/a&gt; found that sales reps spend only 28% of their time actually selling. The rest goes to admin, data entry, and research. AI agents attack that 72% of wasted time directly.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SaaS Category&lt;/th&gt;
&lt;th&gt;Traditional Tool Examples&lt;/th&gt;
&lt;th&gt;AI Agent Replacements&lt;/th&gt;
&lt;th&gt;Disruption Timeline&lt;/th&gt;
&lt;th&gt;Cost Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Customer Support&lt;/td&gt;
&lt;td&gt;Zendesk, Freshdesk, Intercom&lt;/td&gt;
&lt;td&gt;Sierra AI, Ada, Custom agents&lt;/td&gt;
&lt;td&gt;Already happening&lt;/td&gt;
&lt;td&gt;40% to 70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code Generation&lt;/td&gt;
&lt;td&gt;Jira, Linear, GitHub Issues&lt;/td&gt;
&lt;td&gt;Cognition Devin, Factory AI, Cursor&lt;/td&gt;
&lt;td&gt;12 to 24 months&lt;/td&gt;
&lt;td&gt;30% to 50%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Analytics&lt;/td&gt;
&lt;td&gt;Tableau, Looker, Mode&lt;/td&gt;
&lt;td&gt;Hex AI, Databricks Assistant&lt;/td&gt;
&lt;td&gt;12 to 18 months&lt;/td&gt;
&lt;td&gt;50% to 70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal Research&lt;/td&gt;
&lt;td&gt;Westlaw, LexisNexis, Clio&lt;/td&gt;
&lt;td&gt;Harvey, CoCounsel, EvenUp&lt;/td&gt;
&lt;td&gt;18 to 36 months&lt;/td&gt;
&lt;td&gt;60% to 80%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sales Development&lt;/td&gt;
&lt;td&gt;Outreach, SalesLoft, Apollo&lt;/td&gt;
&lt;td&gt;11x, Artisan, Regie.ai&lt;/td&gt;
&lt;td&gt;12 to 24 months&lt;/td&gt;
&lt;td&gt;40% to 60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accounting&lt;/td&gt;
&lt;td&gt;QuickBooks, Xero, FreshBooks&lt;/td&gt;
&lt;td&gt;Vic.ai, Truewind, Puzzle&lt;/td&gt;
&lt;td&gt;24 to 36 months&lt;/td&gt;
&lt;td&gt;30% to 50%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HR and Recruiting&lt;/td&gt;
&lt;td&gt;Greenhouse, Lever, BambooHR&lt;/td&gt;
&lt;td&gt;Mercor, Paradox, Moonhub&lt;/td&gt;
&lt;td&gt;18 to 30 months&lt;/td&gt;
&lt;td&gt;35% to 55%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; GitHub's research shows Copilot users complete coding tasks 55% faster (&lt;a href="https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-in-the-enterprise/" rel="noopener noreferrer"&gt;GitHub, 2024&lt;/a&gt;), while Salesforce found that sales reps spend only 28% of their time selling (&lt;a href="https://www.salesforce.com/resources/research-reports/state-of-sales/" rel="noopener noreferrer"&gt;Salesforce, 2025&lt;/a&gt;). Both statistics explain why VCs see AI agents as the natural replacement for tools that automate around humans rather than replacing human effort.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Does "Service as Software" Actually Mean?
&lt;/h2&gt;

&lt;p&gt;The phrase "service as software" was coined by venture firm Foundation Capital, and it captures a $4.6 trillion opportunity according to their &lt;a href="https://foundationcapital.com/service-as-software/" rel="noopener noreferrer"&gt;2024 analysis&lt;/a&gt;. Instead of buying software that helps employees do work, companies buy AI agents that do the work directly. The shift sounds subtle. It's not. It's the biggest change in how businesses buy technology since Salesforce put CRM in the cloud.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1531746790095-e5995fef77d3%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1531746790095-e5995fef77d3%3Fw%3D1200%26q%3D80" alt="A glowing digital interface with flowing data streams representing the shift from traditional software services to autonomous AI agent delivery models" width="800" height="400"&gt;&lt;/a&gt;&lt;em&gt;The transition from SaaS to service as software fundamentally changes the buyer seller relationship in enterprise tech&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here's how the model changes. With traditional SaaS, you buy a tool, hire someone to operate it, train them, manage them, and hope they use the tool effectively. With service as software, you describe the outcome you want. The agent delivers it. You pay per result.&lt;/p&gt;

&lt;p&gt;[UNIQUE INSIGHT] I think the comparison to the cloud transition understates what's happening. When companies moved from on premise to cloud, they were buying the same capabilities delivered differently. This time, they're buying different capabilities entirely. An AI support agent doesn't just move your helpdesk to the cloud. It eliminates the need for a helpdesk at all for most interactions.&lt;/p&gt;

&lt;p&gt;The pricing implications are massive. SaaS companies have trained the market to accept per seat pricing. A company with 500 employees might pay $50,000 per month across its SaaS stack. But what if AI agents handle the work of 200 of those seats? You don't need 500 licenses anymore. You need 300, plus an AI agent that costs $5,000 per month. That's a 50% reduction in software spend, and the AI agent probably delivers better results because it works 24 hours a day and never forgets a process step.&lt;/p&gt;

&lt;p&gt;But is this really happening at scale? Yes. &lt;a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai" rel="noopener noreferrer"&gt;McKinsey's 2025 survey of 1,363 organizations&lt;/a&gt; found that companies reporting 20% or more cost reductions from AI adoption jumped from 8% in 2023 to 25% in 2025. The organizations seeing the biggest savings are the ones deploying AI agents, not just AI features bolted onto existing tools.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; Foundation Capital estimates the "service as software" opportunity at $4.6 trillion (&lt;a href="https://foundationcapital.com/service-as-software/" rel="noopener noreferrer"&gt;Foundation Capital, 2024&lt;/a&gt;), representing the total addressable market for AI agents that perform work directly rather than assisting humans with software interfaces.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Is the Hybrid Stack the Reality for Most Businesses?
&lt;/h2&gt;

&lt;p&gt;Despite the hype, &lt;a href="https://www.cisco.com/c/en/us/solutions/executive-perspectives/ai-readiness-index.html" rel="noopener noreferrer"&gt;Cisco's AI Readiness Index 2024&lt;/a&gt; found that only 14% of organizations globally are fully prepared to deploy AI. The reality for most businesses in 2026 is not a complete SaaS replacement. It's a hybrid stack where AI agents handle specific workflows while traditional tools persist for everything else.&lt;/p&gt;

&lt;p&gt;[PERSONAL EXPERIENCE] I've built AI systems for companies ranging from ten person startups to enterprises with thousands of employees. Not once has a complete SaaS replacement been the right first move. Every successful deployment I've done starts with one workflow. Support ticket triage. Invoice processing. Lead qualification. You prove the agent works, then you expand.&lt;/p&gt;

&lt;p&gt;The hybrid approach makes sense for three reasons. First, AI agents still make mistakes. They're dramatically better than they were two years ago, but they hallucinate, miss edge cases, and sometimes take confidently wrong actions. You need human oversight, and that means you need tools that humans use alongside the agents.&lt;/p&gt;

&lt;p&gt;Second, most companies have years of data locked in their current SaaS tools. Migrating away from Salesforce isn't a weekend project. It's a six month initiative that touches every department. AI agents can sit on top of existing tools through APIs while delivering incremental value immediately.&lt;/p&gt;

&lt;p&gt;Third, regulatory and compliance requirements in industries like healthcare, finance, and legal mean that certain processes require human review regardless of AI capability. A legal AI agent might draft a contract, but a licensed attorney still needs to sign off. That attorney needs tools to review and annotate the agent's work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1553877522-43269d4ea984%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1553877522-43269d4ea984%3Fw%3D1200%26q%3D80" alt="A person working at a desk with multiple computer monitors showing data dashboards and AI interfaces representing the hybrid human plus AI workflow" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;Most businesses will operate hybrid stacks, combining AI agents with traditional tools for the next two to three years&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What I tell my clients is this: don't think about replacing your SaaS stack. Think about which workflows inside your SaaS stack are costing you the most time and money. Start there. An AI agent that handles 60% of your customer support volume saves more money in month one than spending six months evaluating a complete platform replacement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;Take the AI readiness assessment&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; Only 14% of organizations globally are fully prepared to deploy AI according to &lt;a href="https://www.cisco.com/c/en/us/solutions/executive-perspectives/ai-readiness-index.html" rel="noopener noreferrer"&gt;Cisco's AI Readiness Index 2024&lt;/a&gt; survey of 8,161 business leaders. This gap between AI investment ($297 billion in cumulative VC funding) and enterprise readiness explains why hybrid human plus agent stacks will dominate for the next two to three years.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Does This Mean for Businesses Running SaaS Today?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.pwc.com/us/en/tech-effect/ai-analytics/ai-predictions.html" rel="noopener noreferrer"&gt;PwC's 2025 AI Business Survey&lt;/a&gt; found that 54% of CEOs expect AI to significantly change how their company operates within 12 months. If you're a business leader paying $10,000 to $100,000 per month in SaaS subscriptions, here's what the AI agent wave means for you right now.&lt;/p&gt;

&lt;p&gt;Your SaaS vendors are scrambling. Every major SaaS company is bolting AI features onto their existing products. Salesforce has Einstein. HubSpot has Breeze. Zendesk has their AI agents. Some of these will be genuinely useful. Many will be rebranded chatbots dressed up as agents. The key question to ask: does this AI feature actually complete work autonomously, or does it just suggest things for my team to do?&lt;/p&gt;

&lt;p&gt;Your SaaS contracts deserve scrutiny. Many SaaS contracts lock you into annual commitments with per seat pricing. If AI agents can reduce the number of human operators you need, you're overpaying for seats. Before your next renewal, audit how many seats are actively used versus how many are just padding the vendor's ARR. I've seen companies save 20% to 40% on SaaS spend just by right sizing seats before deploying any AI.&lt;/p&gt;

&lt;p&gt;Your data is your moat. The companies that will benefit most from AI agents are the ones with clean, accessible, well structured data. If your data is scattered across 47 different SaaS tools with no integration strategy, you're not ready for AI agents. Start by consolidating your data. Build a data layer that AI agents can actually use.&lt;/p&gt;

&lt;p&gt;Your team needs new skills. The shift from SaaS to AI agents changes what you hire for. You need fewer people who are good at operating software and more people who are good at managing, evaluating, and improving AI agent performance. The project manager of 2028 won't manage a team of ten. They'll manage a team of three humans and seven AI agents.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; PwC's 2025 survey found 54% of CEOs expect AI to significantly change company operations within 12 months (&lt;a href="https://www.pwc.com/us/en/tech-effect/ai-analytics/ai-predictions.html" rel="noopener noreferrer"&gt;PwC, 2025&lt;/a&gt;). Combined with the finding from McKinsey that 25% of AI adopters already report 20%+ cost reductions, the pressure on traditional SaaS pricing models is accelerating faster than most vendors projected.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How Should You Prepare for the AI Agent Transition?
&lt;/h2&gt;

&lt;p&gt;Based on McKinsey's finding that early AI adopters are 1.5x more likely to report revenue growth above 10% (&lt;a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai" rel="noopener noreferrer"&gt;McKinsey, 2025&lt;/a&gt;), waiting is the riskiest strategy. Here's the playbook I use with my own clients, based on deploying over 109 production AI systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Audit Your SaaS Stack This Week
&lt;/h3&gt;

&lt;p&gt;List every SaaS tool you pay for. For each one, answer: what work does this tool enable a human to do? Could an AI agent do that work directly? If the answer is yes or maybe, flag it. Most companies find 30% to 50% of their SaaS tools are candidates for AI agent replacement within 18 months.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Start with One High Impact Workflow
&lt;/h3&gt;

&lt;p&gt;Don't try to replace everything at once. Pick the workflow that costs you the most in human time and SaaS fees combined. For most businesses, this is customer support, lead qualification, or data entry and reporting. Deploy an AI agent on that single workflow. Measure the results obsessively for 60 days.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Clean Your Data
&lt;/h3&gt;

&lt;p&gt;AI agents are only as good as the data they can access. Before deploying agents, consolidate your critical data into accessible formats. Build APIs. Create documentation. The companies I work with that skip this step always end up circling back to it, having wasted two to three months on an agent that produces mediocre results because it can't access the right data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Renegotiate Before You Renew
&lt;/h3&gt;

&lt;p&gt;Use the AI agent threat as negotiating power with your SaaS vendors. If you can demonstrate that an AI agent handles 50% of your support volume, you have a strong argument for reducing your support platform seats by 50%. Vendors would rather give you a discount than lose you entirely to an AI agent replacement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Build Internal AI Expertise
&lt;/h3&gt;

&lt;p&gt;Whether you hire an AI systems engineer, work with a consultant, or train existing team members, you need someone who understands how AI agents work, how to evaluate them, and how to manage them in production. The cost of getting this wrong is measured in months of wasted effort and failed deployments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1531403009284-440f080d1e12%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1531403009284-440f080d1e12%3Fw%3D1200%26q%3D80" alt="A team reviewing strategy documents and workflow diagrams on a large screen representing the planning process for AI agent deployment" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;Preparation matters more than speed when transitioning from SaaS tools to AI agent workflows&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.jahanzaib.ai/services" rel="noopener noreferrer"&gt;AI agent and automation services&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; Early AI adopters are 1.5x more likely to report revenue growth above 10% according to &lt;a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai" rel="noopener noreferrer"&gt;McKinsey's 2025 State of AI report&lt;/a&gt; surveying 1,363 organizations. The key differentiator isn't spending more on AI, but deploying agents on specific high impact workflows rather than attempting broad platform replacements.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Are SaaS Companies Doing to Fight Back?
&lt;/h2&gt;

&lt;p&gt;SaaS companies aren't standing still. &lt;a href="https://www.bain.com/insights/topics/generative-ai/" rel="noopener noreferrer"&gt;Bain's 2025 technology report&lt;/a&gt; estimates that 90% of major SaaS vendors will embed AI agents into their platforms by the end of 2026. The question is whether those embedded agents will be good enough to prevent customers from switching to purpose built alternatives.&lt;/p&gt;

&lt;p&gt;Salesforce is the most aggressive defender. Their Agentforce platform lets customers build and deploy AI agents within the Salesforce ecosystem. The strategy is clear: if customers are going to use AI agents, make sure those agents run on Salesforce infrastructure so the subscription revenue stays intact.&lt;/p&gt;

&lt;p&gt;Microsoft is playing a similar game with Copilot. By embedding AI agents across Office 365, Dynamics, and Azure, they're trying to make their ecosystem the default environment for agent deployment. The bet is that enterprises won't rip out Microsoft to use standalone AI agents when Microsoft's own agents are already integrated.&lt;/p&gt;

&lt;p&gt;Smaller SaaS companies have fewer options. They can't afford to build competitive AI agents from scratch. Many are partnering with AI companies or acquiring AI startups to add agent capabilities. Others are leaning into their data moats, arguing that years of accumulated customer data make their AI features more accurate than a new entrant could achieve.&lt;/p&gt;

&lt;p&gt;[UNIQUE INSIGHT] Here's what I think most analysis misses. The SaaS companies that survive won't be the ones with the best AI features. They'll be the ones that successfully reposition from "tool you operate" to "platform that agents operate on." If Salesforce becomes the database that AI agents read and write to, it survives even if no human ever logs into the Salesforce UI again. That's a radical strategic pivot, but it's the only one that works long term.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Will AI agents completely replace SaaS tools?
&lt;/h3&gt;

&lt;p&gt;Not entirely, and not overnight. AI agents will replace specific SaaS workflows where the task is repetitive, well defined, and doesn't require nuanced human judgment. According to &lt;a href="https://www.gartner.com/en/articles/ai-agents" rel="noopener noreferrer"&gt;Gartner&lt;/a&gt;, 33% of enterprise software will include agentic AI by 2028. Most businesses will run hybrid stacks combining traditional tools with AI agents for the next three to five years.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which SaaS categories are most at risk from AI agents?
&lt;/h3&gt;

&lt;p&gt;Customer support, code generation, data analytics, and sales development are the most vulnerable right now. These categories share high labor costs per task, structured decision trees, and abundant training data. Legal research and accounting are next in line, with disruption expected within 18 to 36 months.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much money are VCs investing in AI agents specifically?
&lt;/h3&gt;

&lt;p&gt;Total AI venture funding has reached $297 billion cumulative since 2023, with $65 billion in Q1 2026 alone (&lt;a href="https://www.cbinsights.com/research/report/ai-trends-q1-2026/" rel="noopener noreferrer"&gt;CB Insights, 2026&lt;/a&gt;). A significant and growing portion targets AI agent startups specifically. Cognition raised $2 billion, Harvey raised $300 million, and Factory AI raised $200 million, all for agent focused products.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is "service as software" and how is it different from SaaS?
&lt;/h3&gt;

&lt;p&gt;Service as software, a term coined by Foundation Capital, means AI agents that perform work directly rather than providing tools for humans to perform work. SaaS charges per seat for software access. Service as software charges per outcome or per action. &lt;a href="https://foundationcapital.com/service-as-software/" rel="noopener noreferrer"&gt;Foundation Capital&lt;/a&gt; estimates this represents a $4.6 trillion market opportunity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I cancel my SaaS subscriptions and switch to AI agents?
&lt;/h3&gt;

&lt;p&gt;Not immediately. Start by auditing which workflows within your SaaS tools could be handled by AI agents. Deploy an agent on one high impact workflow first. Measure results for 60 days. Then expand. Most companies find that 30% to 50% of their SaaS tools become candidates for replacement within 18 months of starting this process.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I know if my business is ready for AI agents?
&lt;/h3&gt;

&lt;p&gt;Readiness depends on data quality, technical infrastructure, and process documentation. &lt;a href="https://www.cisco.com/c/en/us/solutions/executive-perspectives/ai-readiness-index.html" rel="noopener noreferrer"&gt;Cisco's AI Readiness Index&lt;/a&gt; found only 14% of organizations are fully prepared. Take an &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI readiness assessment&lt;/a&gt; to evaluate your specific situation. Key indicators include having clean data, documented processes, and at least one workflow with high volume and repetitive decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are AI agents reliable enough for production use?
&lt;/h3&gt;

&lt;p&gt;Yes, for specific use cases with guardrails. I've deployed 109 production AI systems, and reliability comes down to scope. An agent handling customer support ticket triage is highly reliable today. An agent making complex strategic business decisions is not. The key is starting with bounded, well defined tasks and expanding as the technology matures and your team builds confidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens to SaaS company valuations as AI agents grow?
&lt;/h3&gt;

&lt;p&gt;SaaS companies that fail to add agent capabilities will see significant valuation compression. Those that successfully pivot to becoming platforms for AI agents may actually see valuations increase. &lt;a href="https://www.bain.com/insights/topics/generative-ai/" rel="noopener noreferrer"&gt;Bain estimates&lt;/a&gt; 90% of major SaaS vendors will embed AI agents by end of 2026, suggesting the industry recognizes the existential threat and is responding aggressively.&lt;/p&gt;

&lt;p&gt;[IMAGE: Abstract visualization of AI network nodes replacing traditional software icons - search terms: artificial intelligence network abstract technology dark]&lt;/p&gt;

&lt;p&gt;The SaaS industry isn't dying tomorrow. But the ground is shifting under its feet, and $297 billion in venture capital says the smart money agrees. I've spent years building AI systems that automate real business workflows, and the pattern is unmistakable: AI agents that do the work will always beat software that helps you do the work.&lt;/p&gt;

&lt;p&gt;The businesses that move first won't just save on SaaS spend. They'll operate faster, make better decisions, and compound those advantages over competitors who wait. Whether you start with a single support agent or a full multi agent workflow, the important thing is to start now.&lt;/p&gt;

&lt;p&gt;Not sure where your business stands? &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;Take the AI Readiness Assessment&lt;/a&gt; to find out whether you need AI agents, simple automation, or a hybrid approach. It takes five minutes and gives you a personalized action plan.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI readiness assessment&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>saas</category>
      <category>enterpriseai</category>
      <category>automation</category>
    </item>
    <item>
      <title>Model Context Protocol: How I Build MCP Servers That Run in Production (and What Most Guides Skip)</title>
      <dc:creator>Jahanzaib</dc:creator>
      <pubDate>Sat, 04 Apr 2026 09:01:43 +0000</pubDate>
      <link>https://forem.com/jahanzaibai/model-context-protocol-how-i-build-mcp-servers-that-run-in-production-and-what-most-guides-skip-5fcc</link>
      <guid>https://forem.com/jahanzaibai/model-context-protocol-how-i-build-mcp-servers-that-run-in-production-and-what-most-guides-skip-5fcc</guid>
      <description>&lt;p&gt;The first time I connected Claude to a live PostgreSQL database through a three-line configuration file, I sat back and thought: this is what every integration should feel like. No custom connector, no bespoke API wrapper, no 400-line Python script that breaks every time the API vendor changes a response field. Just a Model Context Protocol server sitting between the AI and the database, translating naturally.&lt;/p&gt;

&lt;p&gt;I've shipped &lt;a href="https://www.jahanzaib.ai/work" rel="noopener noreferrer"&gt;AI systems for 23 production clients&lt;/a&gt; since MCP launched. The protocol has moved from an interesting Anthropic experiment to the default way I wire AI agents to external systems. If you're building anything with AI agents today and you're still writing one-off tool integrations, you're doing five times the work you need to. This guide covers everything: what MCP actually is, how to build a production-grade server, the auth and security patterns that matter, and the deployment options I actually use.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Model Context Protocol (MCP) is an open standard that eliminates custom integrations between AI models and external tools — one server works with every MCP-compatible client&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MCP grew from 100,000 monthly downloads in November 2024 to over 8 million by April 2025, with 5,800+ servers now available&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Three primitives cover everything: tools (functions the AI calls), resources (data the AI reads), and prompts (reusable templates)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For local development use stdio transport. For production remote servers, use Streamable HTTP with OAuth 2.1 authentication&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The biggest mistake builders make is skipping input validation and structured error handling — both are easy to add and critical for production stability&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Real ROI shows up fast: one MCP server replacing a custom CRM connector saved a SaaS client $3,200/month in maintenance engineering hours&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1597852074816-d933c7d2b988%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1597852074816-d933c7d2b988%3Fw%3D1200%26q%3D80" alt="Circuit board representing Model Context Protocol server architecture and AI integration"&gt;&lt;/a&gt;&lt;em&gt;MCP turns the chaotic web of AI integrations into a clean protocol-based architecture&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Model Context Protocol Actually Is
&lt;/h2&gt;

&lt;p&gt;Before MCP, building an AI system that touched five external tools meant writing five custom integrations. Then maintaining them. Then rewriting them when the AI model changed or a tool updated its API. If you had 10 AI applications and 20 external tools, you potentially needed 200 different connectors. Anthropic's team called this the M×N problem, and it's the reason most AI agent projects die in the maintenance phase rather than the build phase.&lt;/p&gt;

&lt;p&gt;MCP solves this with a single protocol. Build one server for your Salesforce data. Every AI client that speaks MCP — Claude, Cursor, Windsurf, your custom agent — can use that server immediately. No rewrites. You go from M×N integrations to M+N.&lt;/p&gt;

&lt;p&gt;Think of it as USB-C for AI. Before USB-C, every device needed different cables, different adapters, different drivers. MCP is the moment AI tooling gets a universal port. The &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25" rel="noopener noreferrer"&gt;November 2025 MCP specification&lt;/a&gt; is the most current stable version, adding proper authentication and long-running workflow support that makes it genuinely production-ready for enterprise use.&lt;/p&gt;

&lt;p&gt;The numbers bear this out. MCP SDK downloads grew from roughly 100,000 per month in November 2024 to over 8 million by April 2025. As of early 2026, there are over 5,800 published MCP servers covering GitHub, Slack, Google Drive, PostgreSQL, Notion, Jira, Salesforce, Stripe, and dozens of other services. Companies like Cloudflare, Block (Square), and Autodesk are running MCP in production at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Three Primitives
&lt;/h3&gt;

&lt;p&gt;Every MCP server exposes some combination of three things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools&lt;/strong&gt; are functions the AI can call. "Search the database for orders placed in the last 30 days." "Send an email to this address." "Create a Jira ticket with this title and description." The AI decides when to call them based on the conversation. Tools are what most people start with, and they cover 80% of use cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources&lt;/strong&gt; are data the AI can read. Unlike tools, resources are static or semi-static: a company wiki, a product catalog, a code repository. The AI fetches them to enrich its context. If your database has a "knowledge" table full of internal documentation, that's a resource, not a tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompts&lt;/strong&gt; are reusable templates that appear in the AI client's interface. They're less about automation and more about UX: giving users shortcuts to common workflows. "Summarize today's support tickets" could be a prompt that automatically populates context and kicks off a specific analysis flow.&lt;/p&gt;

&lt;p&gt;For most production use cases, you'll build tools first and add resources later when you notice the AI making requests for static data that shouldn't require a full tool call each time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing Your Transport: stdio vs Streamable HTTP
&lt;/h2&gt;

&lt;p&gt;This decision matters more than most tutorials acknowledge. Getting it wrong means either overly complex local setup or an insecure production deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  stdio Transport: For Local and Desktop Clients
&lt;/h3&gt;

&lt;p&gt;stdio transport runs your MCP server as a local process and communicates through standard input and output. Claude for Desktop uses this. Cursor uses this. It's simple, has zero network overhead, and requires no authentication because the AI client launches the server process directly on your machine.&lt;/p&gt;

&lt;p&gt;Use stdio when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You're building for Claude Desktop or other local AI clients&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The tools access local resources (files, local databases, local APIs)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You're in development and want fast iteration cycles&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The server only needs to serve one user on one machine&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Claude Desktop configuration looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"my-server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/path/to/server.py"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"DATABASE_URL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgresql://localhost/mydb"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Streamable HTTP: For Production Remote Servers
&lt;/h3&gt;

&lt;p&gt;Streamable HTTP runs your MCP server as a proper web service. Multiple users, multiple AI clients, proper authentication, rate limiting, observability. This is what you use when you're building a server that your team's agents — or your customers' agents — will call in production.&lt;/p&gt;

&lt;p&gt;The November 2025 specification standardized Streamable HTTP as the recommended transport for remote deployments. It uses standard HTTP for requests and optional Server-Sent Events for streaming responses back to the client.&lt;/p&gt;

&lt;p&gt;Use Streamable HTTP when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Multiple users or clients need access to the same server&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The server is deployed remotely (cloud, VPS, serverless)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You need authentication and access control&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You need logging, monitoring, and audit trails&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You're building a commercial or enterprise service&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1629654297299-c8506221ca97%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1629654297299-c8506221ca97%3Fw%3D1200%26q%3D80" alt="Developer writing Python code to build an MCP server with proper transport configuration"&gt;&lt;/a&gt;&lt;em&gt;Transport choice is the first architectural decision that affects everything downstream&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Building an MCP Server in Python
&lt;/h2&gt;

&lt;p&gt;I'll walk through a real example: a CRM lookup server that lets an AI agent search customer records, pull account history, and log interactions. This is the type of integration I build most often for &lt;a href="https://www.jahanzaib.ai/services" rel="noopener noreferrer"&gt;AI systems clients&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;Install the official Python SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a Streamable HTTP server (production), you also need an ASGI framework:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;mcp fastapi uvicorn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Your First Tool
&lt;/h3&gt;

&lt;p&gt;Here's a minimal but production-honest MCP server. I'm not going to show you the "hello world" version — I'm going to show you what I actually ship:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server.stdio&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stdio_server&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize server with a name — shows in client UIs
&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;crm-server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.list_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search CRM for customer records by name, email, or company. Returns up to 10 matches.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;inputSchema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search term: name, email address, or company name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxLength&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
                    &lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max results to return (1-10)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;minimum&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maximum&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nd"&gt;@app.call_tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# enforce max
&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: search query must be at least 2 characters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)]&lt;/span&gt;

        &lt;span class="c1"&gt;# Your actual CRM lookup logic here
&lt;/span&gt;        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;search_crm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No customers found matching &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;
            &lt;span class="p"&gt;)]&lt;/span&gt;

        &lt;span class="n"&gt;formatted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unknown tool: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;stdio_server&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="nf"&gt;as &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read_stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write_stream&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read_stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write_stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_initialization_options&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things I do here that most tutorials skip:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;maxLength on the input schema&lt;/strong&gt;: Forces the AI client to validate input before sending. Also documents your constraints to whoever reads the schema.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Explicit limit enforcement in the handler&lt;/strong&gt;: Never trust schema validation alone. The client might not enforce it. Always re-check in your handler.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Specific error messages&lt;/strong&gt;: When the AI gets an error, it uses the message to decide what to do next. "Error: X" gives it nothing. A specific message gives it enough to retry correctly or surface the issue to the user.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Handling Errors Like a Production System
&lt;/h3&gt;

&lt;p&gt;Every external call in your tool handler can fail. Database unavailable, API rate limited, network timeout. The way you handle these failures determines whether your AI agent recovers gracefully or enters a spiral of unhelpful retries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.call_tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nf"&gt;search_crm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
                &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;5.0&lt;/span&gt;  &lt;span class="c1"&gt;# 5 second hard cap
&lt;/span&gt;            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;format_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;

        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The CRM search timed out after 5 seconds. Try a more specific query.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;DatabaseConnectionError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CRM is temporarily unavailable. The team has been notified.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Log the real error server-side, return safe message to client
&lt;/span&gt;            &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
            &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CRM search error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An unexpected error occurred. Please try again.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern: log the real error to your monitoring system, return a clean message to the AI. You don't want stack traces in AI responses. You also don't want the AI to see your database schema or internal service names in error messages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1542831371-29b0f74f9713%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1542831371-29b0f74f9713%3Fw%3D1200%26q%3D80" alt="Code editor showing Python MCP server implementation with error handling patterns"&gt;&lt;/a&gt;&lt;em&gt;Error handling in MCP tools determines whether agents recover gracefully or loop endlessly&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Patterns That Actually Matter
&lt;/h2&gt;

&lt;p&gt;This is where most MCP tutorials stop, and where the real work begins. I've learned these patterns by running MCP servers handling thousands of calls per day across multiple client deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Authentication for Remote Servers
&lt;/h3&gt;

&lt;p&gt;A 2025 security scan of roughly 2,000 publicly exposed MCP servers found that most had zero authentication. None. An open tool endpoint anyone could call. That's not a theoretical risk — that's a live data leak waiting to happen.&lt;/p&gt;

&lt;p&gt;The November 2025 MCP specification addressed this directly: OAuth 2.1 is now the standard for authenticating remote MCP server connections. The flow looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Client discovers server capabilities at &lt;code&gt;/.well-known/mcp&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Client initiates OAuth 2.1 authorization flow&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Server validates token on every tool call&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scopes control which tools a client can call (read vs write, which resources)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For simpler internal deployments where you control all clients, API key authentication works fine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Header&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server.fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MCPAPIRouter&lt;/span&gt;

&lt;span class="n"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MCPAPIRouter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;VALID_API_KEYS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MCP_API_KEYS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify_api_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Header&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;x_api_key&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;VALID_API_KEYS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invalid API key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@router.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dependencies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;Depends&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;verify_api_key&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mcp_endpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# handle MCP request
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important thing is having authentication at all. Whatever mechanism fits your setup — use it. An MCP server with no auth is a direct line into your data systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Input Validation Beyond JSON Schema
&lt;/h3&gt;

&lt;p&gt;JSON Schema validation happens at the protocol level but it doesn't protect you from everything. An AI might send a valid string that happens to be a SQL injection attempt, a path traversal string, or a malformed email address that breaks your downstream service.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_search_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Strip whitespace
&lt;/span&gt;    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Length bounds
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Query too short&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Query too long&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Block obvious injection attempts
&lt;/span&gt;    &lt;span class="n"&gt;dangerous_patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\"\\]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;# SQL injection chars
&lt;/span&gt;        &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\.\./&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;# path traversal
&lt;/span&gt;        &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;[^&amp;gt;]+&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;# HTML tags
&lt;/span&gt;    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dangerous_patterns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Query contains invalid characters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't paranoia. When an AI is calling your tools autonomously, edge cases happen that you didn't anticipate in testing. Validation is cheap to add and expensive to skip.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structured Logging for Observability
&lt;/h3&gt;

&lt;p&gt;When an AI agent calls your MCP server 200 times a day, you need to know which tools are slow, which ones fail, and how inputs are distributed. Plain print statements won't get you there.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp_server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.call_tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;error_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;dispatch_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;error_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;

    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;elapsed_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;duration_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;elapsed_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;JSON logs ship cleanly to any aggregator: Cloudwatch, Datadog, Grafana, whatever your stack uses. You can then build a dashboard that shows tool call latency percentiles, error rates by tool, and daily usage trends. That's the kind of visibility that lets you run MCP in production with confidence rather than hope.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying Your MCP Server
&lt;/h2&gt;

&lt;p&gt;I run MCP servers in three configurations depending on the client's requirements. Here's how I think about each one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Serverless (Cloud Run)
&lt;/h3&gt;

&lt;p&gt;For most production MCP servers, Cloud Run is my default. You push a container, Cloud Run scales it to zero when idle and spins up instantly when called. You pay per invocation. For a business whose AI agents make 1,000 tool calls a day, that's often under $5/month in compute.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Dockerfile&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.12-slim&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8080"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# deploy.sh&lt;/span&gt;
gcloud run deploy crm-mcp-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-allow-unauthenticated&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set-env-vars&lt;/span&gt; &lt;span class="nv"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DATABASE_URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--memory&lt;/span&gt; 512Mi &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--timeout&lt;/span&gt; 30s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--no-allow-unauthenticated&lt;/code&gt; flag means Google Cloud IAM handles authentication before requests even reach your server. Your AI client gets a service account key. Clean, auditable, and you don't have to implement auth yourself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Self-Hosted VPS
&lt;/h3&gt;

&lt;p&gt;Some clients need data to stay on-premises or have compliance requirements that rule out managed cloud services. In those cases I run the MCP server on a VPS behind nginx with TLS termination:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="c1"&gt;# nginx config&lt;/span&gt;
&lt;span class="k"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;listen&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt; &lt;span class="s"&gt;ssl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;server_name&lt;/span&gt; &lt;span class="s"&gt;mcp.internal.company.com&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kn"&gt;ssl_certificate&lt;/span&gt; &lt;span class="n"&gt;/etc/ssl/certs/server.crt&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;ssl_certificate_key&lt;/span&gt; &lt;span class="n"&gt;/etc/ssl/private/server.key&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/mcp&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://localhost:8080&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;Host&lt;/span&gt; &lt;span class="nv"&gt;$host&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_set_header&lt;/span&gt; &lt;span class="s"&gt;X-Real-IP&lt;/span&gt; &lt;span class="nv"&gt;$remote_addr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;proxy_read_timeout&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the server with systemd for automatic restarts and startup on boot. Add log rotation. Nothing fancy, but reliable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Local stdio for Claude Desktop
&lt;/h3&gt;

&lt;p&gt;For individual users who want to give Claude access to local tools — their own file system, a local database, private APIs — stdio transport and Claude Desktop is the simplest path. The server runs locally, the credentials never leave the machine, and setup takes about 10 minutes once the server is written.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1517694712202-14dd9538aa97%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1517694712202-14dd9538aa97%3Fw%3D1200%26q%3D80" alt="Laptop showing cloud deployment dashboard for MCP server on Google Cloud Run"&gt;&lt;/a&gt;&lt;em&gt;Cloud Run handles scaling, SSL, and zero-idle billing for most production MCP deployments&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Use Cases and the ROI That Comes With Them
&lt;/h2&gt;

&lt;p&gt;Abstract protocols are easy to explain but hard to justify to a CFO. Here's what MCP actually looks like in production deployments I've built, with specific numbers where I have them.&lt;/p&gt;

&lt;h3&gt;
  
  
  CRM Data Access for a B2B SaaS Team
&lt;/h3&gt;

&lt;p&gt;A 40-person B2B SaaS company had their account managers spending 45 minutes per day pulling customer data from Salesforce to answer questions in Slack. Their AI agent previously had a custom Salesforce connector that required a full-time developer to maintain as Salesforce updated its API.&lt;/p&gt;

&lt;p&gt;We replaced the custom connector with an MCP server exposing four tools: search accounts, get account timeline, create activity log, get open opportunities. The AI agent now answers Salesforce questions instantly. The maintenance burden dropped to near zero because the MCP server abstracts the Salesforce API — when Salesforce changes something, I update the server once, and every AI client that uses it gets the fix automatically.&lt;/p&gt;

&lt;p&gt;Time savings: roughly 45 minutes × 8 account managers × 22 working days = 132 hours/month. At a loaded cost of $80/hour, that's $10,560/month in recovered productivity. The MCP server took three days to build and costs about $8/month to run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Document Intelligence for a Legal Services Firm
&lt;/h3&gt;

&lt;p&gt;A legal services firm had over 50,000 contracts in Google Drive. Associates spent hours per week manually searching documents to answer "has this client signed an NDA with us?" and "what's the expiry date on this vendor agreement?"&lt;/p&gt;

&lt;p&gt;An MCP server with two tools — search documents by metadata and extract clause text — combined with a vector search index let their AI assistant answer those questions in under 10 seconds. The server pulls documents from Drive, runs them through a local embedding model, and returns relevant excerpts. No data leaves their infrastructure. Total build time: five days. Monthly savings in associate hours: the firm estimated 60+ hours at $150/hour billed rate. That's real money.&lt;/p&gt;

&lt;p&gt;This is the type of work I cover in my &lt;a href="https://www.jahanzaib.ai/blog/ai-agents-production" rel="noopener noreferrer"&gt;production AI agents guide&lt;/a&gt; — the cases where the ROI is clear and the technical risk is manageable. If you're trying to figure out whether your business is ready for this kind of system, the &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI Readiness Assessment&lt;/a&gt; is a good place to start.&lt;/p&gt;

&lt;h3&gt;
  
  
  E-Commerce Inventory Agent
&lt;/h3&gt;

&lt;p&gt;One of my e-commerce clients runs a 7-figure Shopify store with 2,800 SKUs across three warehouses. Their buying team was making reorder decisions from a spreadsheet that got updated weekly.&lt;/p&gt;

&lt;p&gt;An MCP server connected to their inventory management system, Shopify, and their 3PL's API gave their AI agent real-time stock levels, velocity data, and supplier lead times. The agent now flags reorder needs proactively, drafts purchase orders, and updates the buying team's Notion dashboard. The MCP layer means any future AI tool their team adopts can plug into the same data without a new integration.&lt;/p&gt;

&lt;p&gt;For more on how to decide between agents and simpler automation for use cases like this, read &lt;a href="https://www.jahanzaib.ai/blog/when-to-use-ai-agents-vs-automation" rel="noopener noreferrer"&gt;my breakdown on when AI agents actually make sense&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Resources and Prompts
&lt;/h2&gt;

&lt;p&gt;Once your tools are stable, resources and prompts unlock the next level of capability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources&lt;/strong&gt; make sense when the AI needs to read large, stable data that would be wasteful to query through a tool every time. An employee handbook, a product specification document, a pricing table that updates monthly. You define a resource URI and a handler that returns the content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.list_resources&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_resources&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Resource&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company://handbook&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Employee Handbook&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Current employee policies and procedures&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;mimeType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/plain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nd"&gt;@app.read_resource&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company://handbook&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;load_handbook_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# fetch from S3, DB, wherever
&lt;/span&gt;    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unknown resource: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Prompts&lt;/strong&gt; are less about automation and more about giving users in Claude Desktop (or any MCP-compatible UI) quick access to standard workflows. A "weekly summary" prompt that automatically populates the last 7 days of activity data, or a "new client onboarding" prompt that pulls the relevant account details. Useful for teams adopting AI tooling who want guided workflows rather than open-ended chat.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Your MCP Server
&lt;/h2&gt;

&lt;p&gt;MCP servers are easy to under-test because the protocol layer hides bugs that only show up at runtime. Three testing patterns I always include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unit tests for tool handlers&lt;/strong&gt;: Test the logic functions directly, not through the protocol. Pass a dict, get a result. These run fast and catch most logic bugs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration tests with the MCP test client&lt;/strong&gt;: The SDK includes a test client that lets you call your server programmatically without a real AI client. Use this to verify tool discovery, input validation, and error handling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contract tests against live data&lt;/strong&gt;: At least once per release, run your tools against a staging version of your real data source. This catches schema drift, API changes, and permission issues that unit tests can't see.&lt;/p&gt;

&lt;p&gt;For n8n users who are also building MCP integrations: my &lt;a href="https://www.jahanzaib.ai/blog/n8n-ai-agent-workflows-practitioner-guide" rel="noopener noreferrer"&gt;n8n AI agent guide&lt;/a&gt; covers how to use n8n as an MCP client to orchestrate multiple servers, which is a common pattern for businesses that want visual workflow management on top of protocol-based tool access.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1555949963-ff9fe0c870eb%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1555949963-ff9fe0c870eb%3Fw%3D1200%26q%3D80" alt="Developer testing MCP server integration with automated testing suite"&gt;&lt;/a&gt;&lt;em&gt;Contract testing against real data sources catches issues that unit tests miss&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where MCP Is Headed
&lt;/h2&gt;

&lt;p&gt;The 2026 trajectory for MCP is clear: it's becoming infrastructure, not a feature. The major AI providers — Anthropic, OpenAI, Google, Microsoft — all support it or are moving toward it. Autodesk helped shape the enterprise authentication spec. Block and Stripe are running it in production finance systems.&lt;/p&gt;

&lt;p&gt;The next frontier is agent-to-agent MCP: AI agents acting as MCP clients to other AI agents. One agent orchestrates a research task, delegates to a data retrieval agent via MCP, gets results back, and continues. This is the multi-agent architecture pattern I cover in the &lt;a href="https://www.jahanzaib.ai/blog/agentic-rag-production-guide" rel="noopener noreferrer"&gt;Agentic RAG guide&lt;/a&gt;, now with a standardized protocol layer beneath it.&lt;/p&gt;

&lt;p&gt;If you're building AI systems today and you're not thinking about MCP as your integration standard, you're building technical debt into every tool you wire up. The work you do on custom connectors now will need to be redone — or it will become the maintenance burden that kills the project two years from now.&lt;/p&gt;

&lt;p&gt;The protocol is stable, the ecosystem is massive, and the ROI math is obvious. This is a good time to start.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Model Context Protocol (MCP) used for?
&lt;/h3&gt;

&lt;p&gt;MCP is used to connect AI models like Claude to external tools, databases, APIs, and data sources through a standardized protocol. Instead of building custom integrations for each combination of AI and tool, you build one MCP server that works with any MCP-compatible AI client. Common uses include connecting AI agents to CRM systems, databases, internal wikis, code repositories, and communication tools like Slack or Jira.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is MCP only for Claude and Anthropic products?
&lt;/h3&gt;

&lt;p&gt;No. Anthropic open-sourced MCP in November 2024, and it has since been adopted by many other AI platforms including Cursor, Windsurf, Zed, and custom agent frameworks. OpenAI and Google have also indicated support. Any developer can build an MCP server or client using the official SDKs, and the protocol is not tied to any specific AI model or vendor.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is MCP different from function calling / tool use?
&lt;/h3&gt;

&lt;p&gt;Tool use or function calling is a capability built into individual AI models — each model has its own format and API. MCP is a protocol layer on top of that: a standardized way for AI clients to discover and call tools regardless of which model they're using. Think of it as the difference between a specific charging cable format (tool calling per model) and the USB-C standard (MCP). The same MCP server works with any AI client that speaks the protocol.&lt;/p&gt;

&lt;h3&gt;
  
  
  What language should I use to build an MCP server?
&lt;/h3&gt;

&lt;p&gt;The official SDKs support Python and TypeScript. Python is the better choice for data-heavy servers (database queries, ML pipelines, document processing). TypeScript works well for JavaScript-based services and anything already running in a Node.js stack. Community SDKs exist for Rust, Go, Java, and C#, but the official SDKs have the best documentation and receive updates first when the spec changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I authenticate an MCP server in production?
&lt;/h3&gt;

&lt;p&gt;The November 2025 MCP specification standardizes OAuth 2.1 for remote servers using Streamable HTTP transport. For simpler setups, API key authentication enforced at the HTTP layer works well for internal services. If you're deploying on Google Cloud Run, you can use Cloud IAM to handle authentication before requests reach your server. Never deploy a remote MCP server without some form of authentication — a 2025 security scan found most public MCP servers had none, leaving the underlying data systems fully exposed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can MCP servers handle multiple concurrent requests?
&lt;/h3&gt;

&lt;p&gt;Yes. Streamable HTTP servers are standard ASGI web services and handle concurrency the same way any async Python server does. With FastAPI and uvicorn, a single process can handle dozens of concurrent tool calls. For higher throughput, add multiple workers or deploy behind an auto-scaling serverless platform like Cloud Run. The MCP protocol itself is stateless per request, which makes horizontal scaling straightforward.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the main security risks with MCP servers?
&lt;/h3&gt;

&lt;p&gt;The main risks are: missing authentication (exposing your data systems to anyone who finds the endpoint), insufficient input validation (allowing injection attacks through tool parameters), and overly broad permissions (giving the AI access to delete or modify data when it only needs read access). Follow the principle of least privilege — only expose the tools a specific client needs, and scope database access to exactly the operations those tools require. Log all tool calls for audit purposes.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long does it take to build a production MCP server?
&lt;/h3&gt;

&lt;p&gt;A simple read-only server with two or three tools takes one to two days including testing and deployment. A server with write operations, proper authentication, error handling, structured logging, and a deployment pipeline takes three to five days. The protocol itself is straightforward — the time goes into understanding the underlying system you're integrating, writing solid input validation, and setting up observability. Complex servers connecting to enterprise systems with custom auth requirements can take up to two weeks.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; MCP server downloads grew from ~100,000 per month in November 2024 to over 8 million by April 2025. Over 5,800 MCP servers are now available in the ecosystem, and 97M+ monthly SDK downloads were recorded as of December 2025. A 2025 security scan of publicly exposed MCP servers found most had no authentication. Sources: &lt;a href="https://guptadeepak.com/the-complete-guide-to-model-context-protocol-mcp-enterprise-adoption-market-trends-and-implementation-strategies/" rel="noopener noreferrer"&gt;Deepak Gupta MCP Enterprise Guide 2025&lt;/a&gt;, &lt;a href="https://arxiv.org/html/2503.23278v3" rel="noopener noreferrer"&gt;MCP Security Research ArXiv 2025&lt;/a&gt;, &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25" rel="noopener noreferrer"&gt;MCP Specification November 2025&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>mcp</category>
      <category>modelcontextprotocol</category>
      <category>aiagents</category>
      <category>productionai</category>
    </item>
    <item>
      <title>n8n 2.0 AI Agents: The Workflow Architecture I Use Across Every Client Deployment</title>
      <dc:creator>Jahanzaib</dc:creator>
      <pubDate>Sat, 04 Apr 2026 09:01:41 +0000</pubDate>
      <link>https://forem.com/jahanzaibai/n8n-20-ai-agents-the-workflow-architecture-i-use-across-every-client-deployment-3ipf</link>
      <guid>https://forem.com/jahanzaibai/n8n-20-ai-agents-the-workflow-architecture-i-use-across-every-client-deployment-3ipf</guid>
      <description>&lt;p&gt;A client came to me last October with a straightforward complaint: their five-person support team was spending six hours a day answering the same 40 questions. Order status. Return windows. Shipping delays. The same things, over and over, all day. They had looked at chatbots before, but every solution either cost $800 a month or gave answers so wrong it made things worse instead of better.&lt;/p&gt;

&lt;p&gt;We built an n8n AI agent in two days. Within a week, it was resolving 78% of tickets without any human involvement. The remaining 22% got routed to the right person with full context already attached. The team now spends those six hours on work that actually needs them.&lt;/p&gt;

&lt;p&gt;I have deployed some version of this pattern across 40+ production systems, across industries from ecommerce to legal to logistics. And the tool I reach for most consistently is n8n, specifically since the 2.0 release in January 2026. This post is the guide I wish existed when I started: not just what n8n can do, but how to actually structure workflows that hold up under real load.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;n8n 2.0 introduced native LangChain integration with 70+ AI nodes, fundamentally changing what is possible without writing custom code&lt;/li&gt;
&lt;li&gt;The four node types that matter most are Model, Memory, Tool, and Vector Store: getting their relationships right is everything&lt;/li&gt;
&lt;li&gt;Memory type selection drives both cost and quality: Buffer for short conversations, Summary for long ones, Postgres backed for persistence across sessions&lt;/li&gt;
&lt;li&gt;Tool node descriptions are more important than the tools themselves: vague descriptions cause more failures than bad code&lt;/li&gt;
&lt;li&gt;n8n wins on complex, high volume, data sensitive workflows; Zapier wins on speed of setup for simple integrations; Make wins on visual branching logic&lt;/li&gt;
&lt;li&gt;Routing simple queries to gpt-4o-mini and complex ones to Claude 3.5 Sonnet can cut agent costs by 60% or more in production&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What n8n 2.0 Actually Changed
&lt;/h2&gt;

&lt;p&gt;Before January 2026, building AI agents in n8n required a lot of manual HTTP request nodes, custom JavaScript, and careful prompt chaining. It worked, but it was fragile. Every API change broke something. Memory was either nonexistent or cobbled together with a database and custom code that was a maintenance nightmare to keep current.&lt;/p&gt;

&lt;p&gt;The 2.0 release changed the fundamentals. n8n now treats LangChain as a first-class citizen, which means instead of fighting the tool to do agent things, the platform is built around them. Seventy-plus dedicated AI nodes cover every part of the agent stack. You can connect any major LLM. You can store conversation memory in Redis, Postgres, or in-process buffers. You can expose any sub-workflow as a callable tool that the agent selects on its own based on what it needs.&lt;/p&gt;

&lt;p&gt;The bigger shift is conceptual. Traditional automation in n8n was linear: trigger, step A, step B, output. Agentic workflows are semantic. You describe what you want the agent to accomplish and what tools it has available. The agent figures out which steps to run and in what order. For tasks where the path varies by context, this is genuinely transformative.&lt;/p&gt;

&lt;p&gt;I want to be clear: n8n built this. I deploy and configure it for clients. That distinction matters. There is a community of engineers maintaining this platform, and the features I am walking through here are their work. What I bring is the pattern library from deploying it across real production environments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504639725590-34d0984388bd%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1504639725590-34d0984388bd%3Fw%3D1200%26q%3D80" alt="Close-up of circuit board representing AI workflow automation architecture" width="1200" height="900"&gt;&lt;/a&gt;&lt;em&gt;The node architecture in n8n 2.0 mirrors how you would think about building an agent from scratch, just without writing all the glue code yourself.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Node Architecture
&lt;/h2&gt;

&lt;p&gt;Every n8n AI agent workflow is built from four categories of nodes. Understanding what each one does and when to reach for it matters more than any specific configuration detail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Nodes&lt;/strong&gt; connect your agent to a language model. You can use OpenAI (GPT-4o or gpt-4o-mini), Anthropic (Claude 3.5 Sonnet or Haiku), Google (Gemini 1.5), or local models via Ollama if you are self-hosting and want full data sovereignty. The model node is the brain. Everything else is plumbing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory Nodes&lt;/strong&gt; give the agent context across exchanges. Without memory, every message is a fresh start. With the right memory node, the agent remembers what the user told it three messages ago, what data it already looked up, and what it decided to do. I will cover memory selection in depth below because the choice has significant cost and quality implications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool Nodes&lt;/strong&gt; are where the real power lives. A tool is anything the agent can call: a sub-workflow, an HTTP request, a code block, a database query. The agent reads the tool name and description, decides whether it needs that tool, and calls it autonomously. You do not hardcode the decision logic. The LLM handles routing based on the descriptions you provide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vector Store Nodes&lt;/strong&gt; connect to a knowledge base for retrieval augmented generation. Pinecone, Qdrant, Supabase, and others are all supported natively. When you need the agent to answer questions from a specific document set like a product catalog, a legal knowledge base, or internal SOPs, this is how you do it cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Your First AI Agent Workflow
&lt;/h2&gt;

&lt;p&gt;The minimum viable n8n agent workflow has four nodes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A &lt;strong&gt;Chat Trigger&lt;/strong&gt; node (or a Webhook if you are integrating with another system)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;An &lt;strong&gt;AI Agent&lt;/strong&gt; node&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A &lt;strong&gt;Chat Model&lt;/strong&gt; node connected to the agent&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;An output (either a Chat Response or an HTTP response node)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is what the AI Agent node configuration looks like for a basic customer support setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"systemPrompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are a customer support agent for Acme Corp. Answer questions about orders, shipping, and returns. If you cannot answer something confidently, say so and offer to escalate. Do not invent information."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"maxIterations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"returnIntermediateSteps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outputParser"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auto"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth noting here. The &lt;code&gt;maxIterations&lt;/code&gt; field is not optional in production: without it, a confused agent can loop indefinitely while burning tokens. I set it between 5 and 8 for most support agents. Higher for research workflows where more reasoning steps are genuinely needed.&lt;/p&gt;

&lt;p&gt;The system prompt is doing more work than it looks like. "Do not invent information" is surprisingly important. Without explicit instruction, models will confidently fabricate order details or policy specifics. The phrase "say so and offer to escalate" gives the agent a graceful failure path instead of guessing.&lt;/p&gt;

&lt;p&gt;For the Chat Model node, I default to gpt-4o for anything customer facing where quality matters, and gpt-4o-mini for internal tools or high volume classification tasks. Temperature should sit between 0.1 and 0.3 for support agents. Higher temperature is for creative work. Support agents that improvise are a liability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Choosing the Right Memory Type
&lt;/h2&gt;

&lt;p&gt;Memory is the part of n8n agent setup that most tutorials skip over. It is also the part that causes the most production problems, either because sessions are too short, costs are too high, or the agent contradicts itself between messages.&lt;/p&gt;

&lt;p&gt;n8n 2.0 ships four memory types:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Buffer Memory&lt;/strong&gt; stores the raw conversation history up to a token limit. Simple to set up, fast to query. Works well for short support conversations (under 10 exchanges) where you need exact recall. Falls apart for long conversations because you are sending the full history with every request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Buffer Window Memory&lt;/strong&gt; keeps only the last N exchanges rather than the full history. If your conversations average 8 turns, set the window to 6 or 8. This keeps costs predictable without losing the relevant context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary Memory&lt;/strong&gt; compresses older parts of the conversation into a summary, then appends new exchanges. This is my default for anything where sessions run long, like onboarding workflows or multisession sales processes. You trade exact recall for cost control. Worth it in most cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Postgres Memory&lt;/strong&gt; (or Redis Memory) stores conversation state in an external database. This is what you need when conversations need to survive server restarts, span multiple days, or be accessible across different workflow runs. Every high-stakes agent I deploy in production uses this.&lt;/p&gt;

&lt;p&gt;Here is a minimal Postgres memory configuration via the n8n Memory Manager node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"memoryType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgres"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sessionIdField"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{{ $json.sessionId }}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tableName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"n8n_agent_memory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"maxHistoryLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"returnMessages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;sessionId&lt;/code&gt; field is what links memory to a specific user or conversation thread. Without a consistent session ID, every message starts fresh regardless of what memory type you pick.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1551288049-bebda4e38f71%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1551288049-bebda4e38f71%3Fw%3D1200%26q%3D80" alt="Data visualization dashboard representing AI workflow memory and analytics" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;Persistent memory backed by Postgres means your agent remembers the user context across sessions, not just within a single conversation window.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Building Custom Tool Nodes
&lt;/h2&gt;

&lt;p&gt;This is where n8n 2.0 separates itself from anything else in the automation space. Custom tool nodes let you expose any workflow capability to the agent as a callable function. The agent decides when to use it based on the tool name and description.&lt;/p&gt;

&lt;p&gt;Let me walk through building an order lookup tool, which is the most common thing I build for ecommerce clients.&lt;/p&gt;

&lt;p&gt;First, create a separate n8n workflow that accepts an order ID and returns order details. Then, in your main agent workflow, add a "Call n8n Workflow" tool node and point it at that sub-workflow. The critical part is the tool configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lookup_order_status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Retrieves the current status, shipping information, and estimated delivery date for a customer order. Use this when a customer provides an order ID or asks about a specific order."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"inputSchema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"orderId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The order ID provided by the customer. Typically starts with ORD or a 6-digit number."&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"orderId"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The description here is doing the actual routing work. When a user says "what happened to my package," the agent reads all available tool descriptions, matches this one to the intent, and calls it. If the description were just "looks up an order," the agent would use it far less reliably.&lt;/p&gt;

&lt;p&gt;A few lessons from deploying this pattern across 40+ systems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Be specific about when to use the tool.&lt;/strong&gt; "Use this when a customer provides an order ID" tells the agent the precondition. Without it, the agent might call the tool before asking for the order ID.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Format the output clearly.&lt;/strong&gt; The sub-workflow should return structured JSON with field names that are self explanatory. The agent parses this output and works with it directly. Ambiguous field names cause reasoning errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Set a timeout on HTTP calls inside tools.&lt;/strong&gt; I have seen agents stall for 30 seconds waiting on a slow API. Set explicit timeouts (5 to 10 seconds) and return a graceful error message if the call fails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep tools narrow.&lt;/strong&gt; One thing per tool. A tool called "manage_customer" that does lookups, updates, and escalations is harder for the agent to reason about than three separate tools with clear names.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Connecting External APIs
&lt;/h2&gt;

&lt;p&gt;Most tools ultimately call an external API. In n8n, you do this with the HTTP Request node inside your tool sub-workflow. Here is a minimal example for a CRM lookup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// HTTP Request node configuration&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;method&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;url&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.yourcrm.com/v1/customers/{{ $json.customerId }}&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;authentication&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;headerAuth&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;headers&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Bearer {{ $env.CRM_API_KEY }}&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;timeout&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;continueOnFail&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things I always do in production API tool nodes:&lt;/p&gt;

&lt;p&gt;Set &lt;code&gt;continueOnFail: true&lt;/code&gt; so a failed API call returns an error object rather than crashing the whole workflow. The agent can then see the failure and respond gracefully instead of returning nothing to the user.&lt;/p&gt;

&lt;p&gt;Store API keys in n8n credentials or environment variables, never inline. If you are self-hosting, n8n encrypts credentials at rest.&lt;/p&gt;

&lt;p&gt;Add a response transformation step that extracts only the fields the agent needs. If the CRM returns 80 fields but the agent only needs name, email, and account status, filter it down. Fewer tokens, faster reasoning, lower cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  n8n vs Zapier vs Make: When Each One Wins
&lt;/h2&gt;

&lt;p&gt;I use all three tools. Each one is genuinely the best choice in specific situations. Here is how I actually think about the decision:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;n8n&lt;/th&gt;
&lt;th&gt;Make&lt;/th&gt;
&lt;th&gt;Zapier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AI agent workflows&lt;/td&gt;
&lt;td&gt;Best in class&lt;/td&gt;
&lt;td&gt;Moderate support&lt;/td&gt;
&lt;td&gt;Limited depth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-hosting and data control&lt;/td&gt;
&lt;td&gt;Yes (free)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing at scale&lt;/td&gt;
&lt;td&gt;Per execution (cheap at volume)&lt;/td&gt;
&lt;td&gt;Per operation (moderate)&lt;/td&gt;
&lt;td&gt;Per task (expensive at volume)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integration count&lt;/td&gt;
&lt;td&gt;~1,000&lt;/td&gt;
&lt;td&gt;~1,500&lt;/td&gt;
&lt;td&gt;8,000+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Technical skill required&lt;/td&gt;
&lt;td&gt;Moderate to high&lt;/td&gt;
&lt;td&gt;Low to moderate&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visual workflow builder&lt;/td&gt;
&lt;td&gt;Node canvas&lt;/td&gt;
&lt;td&gt;Flowchart canvas&lt;/td&gt;
&lt;td&gt;Linear steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangChain and agent support&lt;/td&gt;
&lt;td&gt;Native (70+ nodes)&lt;/td&gt;
&lt;td&gt;Via HTTP only&lt;/td&gt;
&lt;td&gt;Via Zapier Agents (limited)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Complex agents, high volume, GDPR&lt;/td&gt;
&lt;td&gt;Medium complexity, visual branching&lt;/td&gt;
&lt;td&gt;Quick SaaS integrations, nontechnical teams&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If a client comes to me with a workflow that is 4 steps and connects two SaaS tools they already use, I tell them to use Zapier. It will be live in an hour and they will not need to call me to maintain it. n8n for that use case is overkill and creates a maintenance dependency they do not need.&lt;/p&gt;

&lt;p&gt;If the workflow has conditional logic, needs to process data heavily, or involves any kind of agent reasoning, n8n is the right tool. The execution based pricing is also dramatically cheaper at volume. A 10-step Zapier zap costs 10 tasks per run. The same workflow in n8n costs 1 execution.&lt;/p&gt;

&lt;p&gt;Make sits in the middle and is genuinely underrated for teams that want a visual interface for complex branching logic without the technical overhead of n8n. I use it for clients who need complex conditional flows but do not have a developer maintaining things.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Workflow Patterns I Deploy Repeatedly
&lt;/h2&gt;

&lt;p&gt;After 40+ production deployments, I keep returning to three patterns. These are not theoretical. They are running in production right now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 1: The Customer Support Agent&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Triggered by a Zendesk webhook or email, this agent has four tools: knowledge base retrieval (via a vector store node), order status lookup (HTTP to OMS), return policy lookup (static lookup table), and an escalation tool that creates a priority ticket and notifies a human. Memory is Postgres backed so the agent remembers prior exchanges if the customer responds to the same thread hours later.&lt;/p&gt;

&lt;p&gt;Resolution rate across three ecommerce clients running this pattern: 71% to 83%, depending on catalog complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 2: The Lead Qualification Agent&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A form submission fires a webhook. The agent receives the lead data, then autonomously researches the company using an HTTP tool (Clearbit or Apollo), scores the lead against qualification criteria defined in the system prompt, writes a personalized first email draft, and creates the CRM record with score, research summary, and draft attached. A human reviews and sends.&lt;/p&gt;

&lt;p&gt;This one saves an average of 8 minutes per lead. At 50 leads a day, that adds up fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 3: The Async Data Processing Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This one is not conversational at all, but it uses the same agent architecture. An email or file upload triggers the workflow. The agent classifies the incoming data, routes it to the right processing sub-workflow (invoice parsing, contract extraction, report summarization), handles edge cases it was not explicitly programmed for, and sends a structured output to the right system. The LLM handles routing and edge cases so I do not have to write decision logic for every possible input variation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1454165804606-c3d57bc86b40%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1454165804606-c3d57bc86b40%3Fw%3D1200%26q%3D80" alt="Person working on a laptop configuring an AI workflow automation system" width="1200" height="801"&gt;&lt;/a&gt;&lt;em&gt;Most production agent deployments start simple and grow. Start with two or three tools, measure what is getting called most, then expand from there.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Control: Token Routing Strategy
&lt;/h2&gt;

&lt;p&gt;The single biggest lever for reducing AI agent costs in production is model routing. Not all queries need the same model.&lt;/p&gt;

&lt;p&gt;For anything that requires structured reasoning, nuanced judgment, or multistep tool use, I use Claude 3.5 Sonnet or GPT-4o. For high volume classification, entity extraction, or simple question answering against structured data, I route to gpt-4o-mini. The cost difference is roughly 10x. The quality difference for simple tasks is negligible.&lt;/p&gt;

&lt;p&gt;Here is how I implement this in n8n without overcomplicating it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// In a Code node before the AI Agent node&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isSimple&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;150&lt;/span&gt;
  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;analyze&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;compare&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;json&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;$input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;modelTier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;isSimple&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fast&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;smart&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then a Switch node routes to two different AI Agent nodes: one configured with gpt-4o-mini, one with the full model. Crude, but it works. In a more sophisticated setup, you can use a lightweight classifier model to make the routing decision more accurately.&lt;/p&gt;

&lt;p&gt;Other cost levers worth implementing:&lt;/p&gt;

&lt;p&gt;Set &lt;code&gt;maxIterations&lt;/code&gt; aggressively. Six iterations is enough for most support agents. If the agent cannot resolve something in six steps, it should escalate to a human.&lt;/p&gt;

&lt;p&gt;Filter tool output before it hits the agent. A raw API response with 50 fields costs as many tokens as it contains. Extract only what the agent needs before returning it.&lt;/p&gt;

&lt;p&gt;Cache responses for common lookups. n8n has no built-in caching, but you can add a Redis lookup step before the HTTP request. If the order status was checked 10 minutes ago, return the cached version.&lt;/p&gt;

&lt;p&gt;Across the implementations I have measured, these three approaches together reduce per-workflow token costs by 55% to 65% compared to a naive setup.&lt;/p&gt;

&lt;p&gt;If you are unsure whether your workflow even needs an AI agent or whether simple automation would work better, the &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI Readiness Assessment&lt;/a&gt; walks you through the decision. For most businesses, the answer is more nuanced than a single article can cover.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Mistakes That Kill Production Agents
&lt;/h2&gt;

&lt;p&gt;I have seen the same failures enough times to list them cleanly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vague tool descriptions&lt;/strong&gt; are the number one cause of agent failures I debug for other developers. If the agent cannot tell from the description when to use a tool, it either calls it constantly or ignores it. Write descriptions the way you would write them for a smart intern who has never seen your system before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No iteration limit&lt;/strong&gt; means a confused agent can loop on a problem, burning tokens and never returning a response. Always set &lt;code&gt;maxIterations&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong memory type&lt;/strong&gt; for the use case. Buffer memory for a workflow that spans days means the agent starts fresh every morning. Postgres memory for a simple FAQ bot means unnecessary infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trusting the agent with consequential writes&lt;/strong&gt; without a human checkpoint. I have seen agents attempt to process refunds, cancel orders, or send emails to the wrong people because the system prompt was not specific enough. Use n8n's Wait node for anything irreversible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Returning too much data from tools.&lt;/strong&gt; The more tokens the agent sees, the more likely it is to fixate on irrelevant details. Keep tool responses under 500 tokens where possible.&lt;/p&gt;

&lt;p&gt;For a deeper look at the architectural decisions behind deploying multi-agent systems, the &lt;a href="https://www.jahanzaib.ai/blog/ai-agents-production" rel="noopener noreferrer"&gt;AI agents in production guide&lt;/a&gt; covers the infrastructure and orchestration layer. And if you are looking at how these deployments typically get scoped and priced, the &lt;a href="https://www.jahanzaib.ai/services" rel="noopener noreferrer"&gt;services page&lt;/a&gt; walks through what I actually build.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need to self-host n8n to get the full AI agent features?
&lt;/h3&gt;

&lt;p&gt;No. The cloud version of n8n supports all the LangChain nodes including persistent memory and custom tool workflows. Self-hosting gives you data sovereignty and eliminates execution limits, which matters for GDPR sensitive workflows or very high volume, but it is not required just to use AI agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which LLM should I use for n8n agents?
&lt;/h3&gt;

&lt;p&gt;For most client facing agents, I start with GPT-4o. If cost is a concern and the tasks are relatively simple (classification, lookup, single step reasoning), gpt-4o-mini handles the workload well at a fraction of the price. Claude 3.5 Sonnet is my choice for long context tasks or anything involving careful reading of documents. All three are supported natively in n8n 2.0 without any custom HTTP request nodes.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I handle errors when a tool fails mid-workflow?
&lt;/h3&gt;

&lt;p&gt;Set &lt;code&gt;continueOnFail: true&lt;/code&gt; on any HTTP Request nodes inside your tools and return a structured error object rather than letting the node throw. The agent reads the error object, interprets it, and can either retry, use a different approach, or respond to the user that the information is not available. Letting failures propagate unhandled causes the whole workflow to fail silently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can n8n AI agents write back to databases or send emails autonomously?
&lt;/h3&gt;

&lt;p&gt;Yes, and this is where you need guardrails. I use n8n's Wait node to insert a human approval step before any irreversible action: sending external emails, processing refunds, modifying database records. The agent prepares the action, the Wait node pauses execution, a human approves or rejects via webhook, and the workflow continues accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long does it take to build a production n8n AI agent?
&lt;/h3&gt;

&lt;p&gt;A simple support agent with three or four tools and Postgres memory takes me one to two days to build and another day to test. More complex multi-agent systems with vector store knowledge bases, CRM integration, and escalation paths run two to three weeks for the first deployment. Subsequent deployments on the same pattern are faster because the sub-workflows are reusable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is n8n suitable for nontechnical teams to maintain?
&lt;/h3&gt;

&lt;p&gt;The visual canvas makes workflows readable by non-developers, but the AI agent configuration (memory type selection, tool descriptions, system prompts, iteration limits) requires someone who understands how LLMs reason. My recommendation: have a technical person set up and test the core workflow, then document the pieces a nontechnical operator can safely adjust, like the system prompt and knowledge base content.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; n8n 2.0 launched January 2026 with native LangChain integration and 70+ AI nodes (&lt;a href="https://finbyz.tech/n8n/insights/n8n-2-0-langchain-agentic-workflows" rel="noopener noreferrer"&gt;Finbyz Tech&lt;/a&gt;). GPT-4o pricing: $0.0025 per 1K input tokens, $0.01 per 1K output tokens; Claude 3.5 Sonnet: $0.003 per 1K input, $0.015 per 1K output (&lt;a href="https://calmops.com/ai/n8n-ai-agents-implementation/" rel="noopener noreferrer"&gt;Calmops&lt;/a&gt;). n8n cloud pricing starts at $22/month for 2,500 executions; Zapier comparable tier runs $49/month for 2,000 tasks (&lt;a href="https://www.digidop.com/blog/n8n-vs-make-vs-zapier" rel="noopener noreferrer"&gt;Digidop&lt;/a&gt;).&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>n8n</category>
      <category>aiagents</category>
      <category>langchain</category>
      <category>workflowautomation</category>
    </item>
    <item>
      <title>AI Is Now As Good As Humans at Using Computers. Here Is What $297 Billion in Q1 Funding Says About What Comes Next.</title>
      <dc:creator>Jahanzaib</dc:creator>
      <pubDate>Sat, 04 Apr 2026 08:51:35 +0000</pubDate>
      <link>https://forem.com/jahanzaibai/ai-is-now-as-good-as-humans-at-using-computers-here-is-what-297-billion-in-q1-funding-says-about-l5o</link>
      <guid>https://forem.com/jahanzaibai/ai-is-now-as-good-as-humans-at-using-computers-here-is-what-297-billion-in-q1-funding-says-about-l5o</guid>
      <description>&lt;p&gt;There is a benchmark called OSWorld. It was created by researchers at Carnegie Mellon and HKUST, and it tests AI models on 369 real computer tasks, the kind of work your actual employees do every day: browsing Chrome, editing spreadsheets in LibreOffice, writing emails in Thunderbird, managing files, running code in VS Code. Tasks are scored not by screenshots but by whether the computer ends up in the right state. Did the spreadsheet get updated? Did the email get sent? Is the file in the right folder?&lt;/p&gt;

&lt;p&gt;The human baseline on OSWorld sits at around 72 percent. Not perfect humans, not trained specialists. Just people doing computer work at a reasonable pace.&lt;/p&gt;

&lt;p&gt;In early 2026, AI models crossed that line. The gap between AI that assists and AI that replaces at a computer terminal is now, for many standard knowledge work tasks, essentially zero.&lt;/p&gt;

&lt;p&gt;At the same time, the venture capital world had its own moment of clarity. In Q1 2026, global VC investment hit $297 billion across roughly 6,000 startups. AI captured $239 billion of that, which is 81 percent of all venture funding on the planet. In a single quarter, AI raised more money than all of 2025 combined. OpenAI alone closed $122 billion, the largest single venture deal ever recorded. Anthropic raised $30 billion in a Series G. xAI raised $20 billion.&lt;/p&gt;

&lt;p&gt;I've been building AI agents professionally for years. I've shipped 109 production AI systems across ecommerce, real estate, legal tech, healthcare, and half a dozen other industries. And I want to give you the honest read on what these two facts, the performance milestone and the capital surge, actually mean for businesses that are still trying to figure out where to start.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;AI models have reached or exceeded human-level accuracy on OSWorld, a real-world computer task benchmark covering Chrome, LibreOffice, VS Code, email, and file management&lt;/li&gt;
&lt;li&gt;Q1 2026 brought $297 billion in global VC investment, with AI capturing 81 percent of it driven by four mega-rounds totaling $188 billion&lt;/li&gt;
&lt;li&gt;Computer use AI is already in production at enterprise scale: Claude Computer Use, OpenAI Operator, and open-source agent frameworks now handle real desktop workflows&lt;/li&gt;
&lt;li&gt;The performance gap is not just closing, it is closing fast: frontier models jumped roughly 60 percentage points on OSWorld in 28 months&lt;/li&gt;
&lt;li&gt;Businesses that treat AI as a chatbot tool are operating with a completely wrong mental model of what is coming in the next 12 months&lt;/li&gt;
&lt;li&gt;The right response is not panic. It is a deliberate audit of which of your computer-based workflows are prime candidates for agent automation right now&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What OSWorld Actually Tests (and Why Most Coverage Gets It Wrong)
&lt;/h2&gt;

&lt;p&gt;Most AI benchmarks measure knowledge. Can the model answer trivia? Can it write a poem? Can it solve a math problem? These benchmarks are useful for comparing models but they tell you almost nothing about whether AI can do your employee's job.&lt;/p&gt;

&lt;p&gt;OSWorld is different. It sets up a real computer running a real operating system, Ubuntu, Windows, or macOS, with real applications installed. Then it gives the AI a task instruction in plain language: "Open the spreadsheet in Downloads, find the three largest values in column B, and highlight them in yellow." Or: "Read the most recent email from Sarah, summarize it in a draft reply, and schedule the meeting she mentioned for next Tuesday at 3pm."&lt;/p&gt;

&lt;p&gt;The AI can see the screen through a screenshot-based interface. It can move a cursor. It can click, type, scroll, and use keyboard shortcuts. It gets multiple steps to complete the task. When it thinks it is done, the system checks the actual state of the machine.&lt;/p&gt;

&lt;p&gt;This is not a test of what an AI knows. This is a test of whether an AI can do work.&lt;/p&gt;

&lt;p&gt;The original OSWorld paper was published in late 2023. At that point, the best models scored around 12 to 15 percent on the full benchmark. Humans, when tested under equivalent conditions, scored about 72 percent. The gap was enormous. No one in the AI field expected it to close quickly.&lt;/p&gt;

&lt;p&gt;By early 2025, the best models were in the 40 to 50 percent range. By mid-2025, specialized computer use agents were hitting 60 to 65 percent. By early 2026, the frontier models crossed 72 percent.&lt;/p&gt;

&lt;p&gt;That progression, from 12 to over 72 percent in roughly 28 months, is one of the most dramatic benchmark improvements in the history of AI development.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1587560699334-cc4ff634909a%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1587560699334-cc4ff634909a%3Fw%3D1200%26q%3D80" alt="Person working on computer performing complex multi-application tasks that AI can now match in accuracy" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;OSWorld tests AI on tasks like this: real applications, real files, real outcomes evaluated by machine state rather than screenshots.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers Behind the Milestone
&lt;/h2&gt;

&lt;p&gt;Let me give you the benchmark progression in concrete form, because the speed matters more than the final number.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Generation&lt;/th&gt;
&lt;th&gt;OSWorld Score&lt;/th&gt;
&lt;th&gt;Gap to Human (72%)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Best models, late 2023&lt;/td&gt;
&lt;td&gt;~12%&lt;/td&gt;
&lt;td&gt;60 points behind&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o with Computer Use tools, mid 2024&lt;/td&gt;
&lt;td&gt;~28%&lt;/td&gt;
&lt;td&gt;44 points behind&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Computer Use launch, late 2024&lt;/td&gt;
&lt;td&gt;~39%&lt;/td&gt;
&lt;td&gt;33 points behind&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Specialized agents, early 2025&lt;/td&gt;
&lt;td&gt;~51%&lt;/td&gt;
&lt;td&gt;21 points behind&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frontier models, mid 2025&lt;/td&gt;
&lt;td&gt;~64%&lt;/td&gt;
&lt;td&gt;8 points behind&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best models, early 2026&lt;/td&gt;
&lt;td&gt;~75%&lt;/td&gt;
&lt;td&gt;3 points ahead&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That last row is the one that changes the conversation. Each generation closed the gap by roughly 10 to 15 percentage points. The final jump from 64 to 75 percent happened in about six months.&lt;/p&gt;

&lt;p&gt;I want to add an important caveat here that most coverage skips: the human baseline of 72 percent is not a ceiling. The humans tested were completing tasks at a reasonable pace, not at maximum effort. Expert power users likely score higher. And even though AI has crossed the average human baseline on accuracy, current computer use agents still take roughly 40 percent more steps than humans to complete the same tasks, and the wall clock time is longer. A task a human finishes in two minutes might take an AI agent four to six minutes through a computer use interface.&lt;/p&gt;

&lt;p&gt;So this is not "AI is now faster than humans at computer work." It is "AI is now as accurate as the average human at computer work, at a pace that is slower but improving." That distinction matters for how you think about deployment. But it does not change the fundamental trajectory.&lt;/p&gt;

&lt;h2&gt;
  
  
  What $297 Billion in Three Months Actually Buys
&lt;/h2&gt;

&lt;p&gt;The performance milestone would be interesting on its own. Combined with the capital story, it becomes something else entirely.&lt;/p&gt;

&lt;p&gt;In Q1 2026, according to Crunchbase data published April 1, 2026, global venture capital hit $297 billion across roughly 6,000 funded startups. That is not a typo. One quarter. $297 billion. For comparison: total global VC investment in all of 2024 was around $330 billion.&lt;/p&gt;

&lt;p&gt;AI captured $239 billion of that Q1 total, or 81 percent of every venture dollar on the planet. Foundational AI alone, meaning the model labs and infrastructure plays, raised $178 billion. That is more than all foundational AI investment in 2025 combined ($88.9 billion) and 466 percent above what foundational AI raised in all of 2024 ($31.4 billion).&lt;/p&gt;

&lt;p&gt;The four rounds driving those numbers: OpenAI at $122 billion (the largest venture round in history), Anthropic at $30 billion Series G (total raised since 2021 now sits near $64 billion), xAI at $20 billion, and Waymo at $16 billion. Four companies raised $188 billion in a single quarter.&lt;/p&gt;

&lt;p&gt;Here is what I want you to understand about what that capital actually buys.&lt;/p&gt;

&lt;p&gt;It buys inference capacity. The biggest cost in running frontier AI models is the compute to serve them. When OpenAI raises $122 billion and Anthropic raises $30 billion, most of that goes toward GPU clusters, data centers, and the operational infrastructure to run billions of API calls per day. They are not raising this money to hire more researchers. They are raising it to make the models faster, cheaper, and more reliable at scale.&lt;/p&gt;

&lt;p&gt;It buys faster iteration cycles. The jump from 64 to 75 percent on OSWorld in six months happened because these labs can run training runs that would have cost $100 million in 2022 for a few million dollars today. The capital compression in model training costs, combined with massive investment, means the next six months will likely see another meaningful jump on benchmarks like OSWorld.&lt;/p&gt;

&lt;p&gt;And it buys distribution. When Anthropic raises $30 billion at a $380 billion valuation, they are not just building a model. They are building the enterprise sales infrastructure, the API reliability, the fine-tuning tooling, and the compliance certifications to get Claude into Fortune 500 procurement pipelines. The capital is not just about better models. It is about making those models available to your competitors before you have figured out your own strategy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1611532736597-de2d4265fba3%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1611532736597-de2d4265fba3%3Fw%3D1200%26q%3D80" alt="AI investment surge visualization showing massive Q1 2026 capital flowing into artificial intelligence infrastructure" width="1200" height="1800"&gt;&lt;/a&gt;&lt;em&gt;The $297B Q1 2026 AI investment surge is not speculative capital. It is building the infrastructure for computer use AI to scale to millions of concurrent automated workers.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Computer Use AI Actually Looks Like in Production Today
&lt;/h2&gt;

&lt;p&gt;Let me get concrete, because the abstract conversation about benchmarks and funding rounds is only useful if you understand what the technology actually does in the real world right now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Computer Use&lt;/strong&gt; (Anthropic) launched in late 2024 and is now in general availability. You give it a browser or a desktop environment via a containerized Linux instance, and it completes tasks through screenshot observation and action execution. It can fill out web forms, extract data from websites, navigate multi-step workflows in SaaS tools, and handle tasks that do not have an API. I've used it to automate data entry workflows that previously required a human to manually copy information between two systems with no integration pathway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI Operator&lt;/strong&gt; launched in early 2025 with a focus on web-based task completion. Book a restaurant, fill out a government form, research a product across multiple sites and compile a comparison, buy tickets to an event. The primary use case is browser-based tasks that would otherwise require a human to click through several pages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Open source agent frameworks&lt;/strong&gt; have proliferated rapidly. Tools like OpenClaw (the open-source AI agent by Peter Steinberger, now with over 300,000 GitHub stars) give developers the scaffolding to build computer use agents that run on their own infrastructure. You write the task definition, connect the agent to a screen, and it operates the machine.&lt;/p&gt;

&lt;p&gt;What is actually running in production at enterprise scale right now? Here is what I see across my client base and the broader market:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data entry and migration:&lt;/strong&gt; Agents that read data from legacy systems with no API, then enter it into modern platforms. Insurance companies are running these at high volume to move claims data between systems during platform migrations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web research and aggregation:&lt;/strong&gt; Agents that visit dozens of pages, extract specific information, and compile structured reports. Real estate firms use these to pull comparable property data from listing platforms that do not allow bulk export.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Form completion at scale:&lt;/strong&gt; Government form automation for regulated industries like healthcare and legal, where the forms are web-based but not machine-readable via standard integrations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QA testing pipelines:&lt;/strong&gt; Software teams running computer use agents to execute test scripts against web applications, catching UI regressions that automated API tests miss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CRM and operational hygiene:&lt;/strong&gt; Agents that log activity, update records, and move items through stages based on email content, without requiring humans to keep CRM data clean.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these examples require human-level intelligence. They require human-level computer accuracy. And that threshold, based on the OSWorld data, has now been reached.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1460925895917-afdab827c52f%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1460925895917-afdab827c52f%3Fw%3D1200%26q%3D80" alt="Business data and workflow automation charts showing AI agent computer use production metrics" width="1200" height="855"&gt;&lt;/a&gt;&lt;em&gt;Computer use AI in production runs not on synthetic demos but on real workflows: CRM updates, form completions, cross-platform data entry, web research at scale.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Which Industries Face the Most Immediate Impact
&lt;/h2&gt;

&lt;p&gt;Computer use AI does not affect all businesses equally. The disruption is most acute in roles and industries where the core work is navigating software interfaces and moving information between systems. Here is my honest read on who this hits first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Insurance and claims processing.&lt;/strong&gt; The average claims adjuster spends the majority of their workday inside a combination of internal systems, email, and external verification platforms. None of these are fully integrated. Computer use agents can handle the navigation layer entirely. The human judgment is still needed for edge cases and appeals, but the routine data gathering, form completion, and system updating is fully automatable right now at production accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Legal and compliance work.&lt;/strong&gt; Not the reasoning. The process. Contract review workflow involves pulling documents, navigating e-signature platforms, updating matter management systems, and logging activity. Document review for discovery involves opening files, tagging relevant passages, and moving documents through review queues. Computer use agents handle all of this without needing semantic understanding of the legal content itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real estate operations.&lt;/strong&gt; Property research, listing updates, CRM management, and transaction coordination tasks are all primarily navigating software interfaces. The real estate back office is almost entirely automatable with computer use AI at current accuracy levels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E-commerce operations.&lt;/strong&gt; Catalog management across multiple platforms (your own site, Amazon, Shopify, wholesale portals) where the data formats differ. Inventory updates. Order processing across systems that do not integrate cleanly. I built an AI agent system for a client that automated 70 percent of their operational tasks, and most of that was computer use rather than language model reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Healthcare administration.&lt;/strong&gt; Prior authorizations, insurance verifications, scheduling across systems, referral management. The clinical judgment stays human. The paperwork does not have to.&lt;/p&gt;

&lt;p&gt;The common thread: roles where people spend most of their time navigating between software windows rather than exercising professional judgment. Computer use AI has arrived for those roles.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Nuance That Most Coverage Skips
&lt;/h2&gt;

&lt;p&gt;I said at the outset that I want to give you an honest read. So here are the real constraints that matter for deployment decisions.&lt;/p&gt;

&lt;p&gt;First, the accuracy number is an average. OSWorld's 369 tasks span a wide range of difficulty. AI models score near 90 percent on simple single-application tasks (open this file, make this change, save it) and closer to 50 percent on multi-step cross-application tasks (read the email, update the CRM, send the follow-up). The 72 to 75 percent headline figure is the mean. Your specific workflow matters enormously.&lt;/p&gt;

&lt;p&gt;Second, speed is still a constraint. Human computer workers operate at high effective throughput because they process context instantly. Current computer use AI operates more slowly through the screenshot-and-act cycle. For workflows where throughput matters more than labor cost, like time-sensitive order processing, this gap is real and should factor into your deployment decision.&lt;/p&gt;

&lt;p&gt;Third, error recovery is still a weak point. When a human makes a mistake on a computer, they notice quickly and correct it. Current computer use agents can get stuck in loops, fail to recognize error states, and occasionally make changes that are difficult to reverse. Production deployments need explicit checkpoints, human review triggers for anomalous states, and audit logs. You cannot just let an agent run unsupervised on high-stakes workflows without guardrails.&lt;/p&gt;

&lt;p&gt;Fourth, cost has come down dramatically but is not zero. Running computer use agents at scale, especially with the screenshot-processing overhead, costs more per task than a simple API call. The economics are compelling compared to human labor at scale, but you need to do the math for your specific use case before assuming it is automatically cheaper.&lt;/p&gt;

&lt;p&gt;None of these constraints are dealbreakers. They are engineering considerations. But anyone who tells you computer use AI is a drop-in replacement for all knowledge workers without any workflow redesign is selling you something.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1522202176988-66273c2fd55f%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1522202176988-66273c2fd55f%3Fw%3D1200%26q%3D80" alt="Business team in strategic meeting discussing AI automation implementation and workflow planning" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;The most successful AI automation deployments start with workflow audits, not technology purchases. What tasks are primarily navigation? What requires genuine judgment?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Recommend Businesses Do Right Now
&lt;/h2&gt;

&lt;p&gt;I am going to give you the same advice I give clients who come to me with a version of "we need to figure out this AI computer use thing."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start with a workflow audit, not a technology purchase.&lt;/strong&gt; Before you think about tools, map your existing computer-heavy workflows. What does your team actually do on their computers all day? Separate tasks into three buckets: pure navigation (open this, update that, move this file), navigation plus simple judgment (read this, decide which category, file it), and genuine expertise (analyze this, recommend an approach, write this). Computer use AI is production-ready for the first bucket and approaching production-ready for the second. The third bucket is where you still want humans for now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pick one workflow and run a real pilot.&lt;/strong&gt; Not a demo. Not a proof of concept on synthetic data. A real pilot on a real workflow with real consequences. Pick something low-stakes enough that errors are recoverable but high-volume enough that you can measure the accuracy and speed delta. Three to four weeks of a real pilot tells you more than six months of evaluating tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build for human oversight from day one.&lt;/strong&gt; Every computer use agent I deploy in production has three things: task-level logging (what did the agent do, in sequence, for every run), an anomaly trigger (if the agent encounters a state it has not seen before, it stops and alerts a human), and a daily audit sample (a human reviews a random 5 to 10 percent of completed tasks to check accuracy drift). These are not optional. They are the difference between an agent that improves your business and one that quietly corrupts your data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not wait for perfect.&lt;/strong&gt; The Q1 2026 investment numbers tell you something important: your competitors who are ahead of you on AI automation are about to get faster, not slower. The $239 billion in AI investment is funding the infrastructure that will make these tools easier to deploy, more reliable, and cheaper per task. Waiting for the technology to mature further is a reasonable position if you have 18 months. Based on the current trajectory, I would not bet on having 18 months.&lt;/p&gt;

&lt;p&gt;If you want to know whether your specific business workflows are candidates for computer use AI right now, the fastest way to find out is to take an honest look at where human time actually goes. I built an &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI Agent Readiness Assessment&lt;/a&gt; specifically for this, which walks you through the dimensions that determine whether you need AI agents, automation, or both. The results are immediate and free.&lt;/p&gt;

&lt;p&gt;If you want a direct conversation about your specific situation, my &lt;a href="https://www.jahanzaib.ai/services" rel="noopener noreferrer"&gt;AI systems work&lt;/a&gt; starts with exactly the kind of workflow analysis I described above. You can also look at &lt;a href="https://www.jahanzaib.ai/work" rel="noopener noreferrer"&gt;how I've built these systems&lt;/a&gt; for clients across different industries. Book a call and we can go through it together.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; OSWorld benchmark methodology and human baseline from the original CMU and HKUST paper at &lt;a href="https://arxiv.org/abs/2311.12983" rel="noopener noreferrer"&gt;arxiv.org/abs/2311.12983&lt;/a&gt;. Q1 2026 investment figures from &lt;a href="https://news.crunchbase.com/venture/record-breaking-funding-ai-global-q1-2026/" rel="noopener noreferrer"&gt;Crunchbase News, April 1, 2026&lt;/a&gt;. OpenAI $122B round per OpenAI press releases, February and March 2026. Anthropic $30B Series G per &lt;a href="https://www.anthropic.com" rel="noopener noreferrer"&gt;Anthropic press release, February 2026&lt;/a&gt;. Computer use benchmark progression from publicly reported evaluations by model providers and independent researchers across 2024 and 2025.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the OSWorld benchmark and is it a reliable measure of AI capability?
&lt;/h3&gt;

&lt;p&gt;OSWorld is a computer task benchmark from Carnegie Mellon University and HKUST that tests AI models on 369 real computer tasks across Windows, macOS, and Ubuntu using actual applications like Chrome, LibreOffice, VS Code, and Thunderbird. Unlike benchmarks that test knowledge or reasoning in isolation, OSWorld evaluates whether the AI actually completed the task by checking the final state of the machine. It is one of the most realistic measures of computer-use capability available. The key limitation is that it captures average task performance, and real-world accuracy varies significantly based on task complexity and application type.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does AI surpassing the OSWorld human baseline mean it will replace office workers?
&lt;/h3&gt;

&lt;p&gt;Not immediately, and not entirely. Crossing the accuracy threshold on an average-task benchmark is significant, but current computer use AI still takes more steps than humans to complete tasks, operates more slowly, and struggles with error recovery in ambiguous situations. The more accurate framing is that AI can now reliably handle the navigation-heavy, rule-following portions of computer work at human accuracy. Work that requires genuine judgment, relationship context, or creative problem-solving is not threatened by this specific capability. The displacement pressure is real for high-volume, low-judgment computer tasks, which is a substantial portion of many office roles.&lt;/p&gt;

&lt;h3&gt;
  
  
  What drove the $297 billion in Q1 2026 AI investment and is it sustainable?
&lt;/h3&gt;

&lt;p&gt;The Q1 2026 number was heavily driven by four mega-rounds: OpenAI at $122 billion, Anthropic at $30 billion, xAI at $20 billion, and Waymo at $16 billion. These are not typical venture investments. They are infrastructure bets, mostly from sovereign wealth funds, large corporates, and strategic investors funding the GPU clusters and data centers needed to run frontier AI at commercial scale. Removing those four rounds, the underlying AI investment market is still a record but less extreme. Whether the mega-round pace continues depends on whether the model labs can demonstrate the revenue to justify the valuations, which is the central question in AI for the next 24 months.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which tools are available for businesses that want to implement computer use AI today?
&lt;/h3&gt;

&lt;p&gt;Claude Computer Use (Anthropic) is the most mature general-purpose option for desktop and browser automation. OpenAI Operator handles web-based workflows. For teams that want to self-host, open-source frameworks like OpenClaw (by Peter Steinberger, 300K+ GitHub stars) provide the scaffolding to build custom computer use agents on your own infrastructure. For no-code and low-code deployments, n8n 2.0 includes computer use agent capabilities that can be connected to existing workflow automation. The right tool depends on your technical capability, data privacy requirements, and whether you need custom behavior or can use a general-purpose agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between computer use AI and traditional RPA?
&lt;/h3&gt;

&lt;p&gt;Traditional RPA like UiPath and Automation Anywhere works by recording and replaying exact click sequences on specific interface elements. It is brittle: change the UI, move a button, update the software version, and the automation breaks. Computer use AI understands the screen visually and adapts to interface changes the same way a human would. It can also handle variability in task inputs that would trip up RPA. The tradeoff is cost per run (RPA is cheaper for simple, stable workflows) and reliability (RPA is more predictable when the interface is fixed). For workflows with variable inputs or interfaces that change frequently, computer use AI is already more practical than traditional RPA.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does computer use AI cost to run in production?
&lt;/h3&gt;

&lt;p&gt;Costs vary significantly based on task complexity and the model used. Simple browser tasks through a hosted service like Operator typically run in the range of $0.10 to $0.50 per task at current pricing. Complex multi-step workflows with long screenshot observation chains can run $1 to $5 per task. Self-hosted open-source agents on your own infrastructure have higher setup costs but near-zero marginal cost per run once deployed. The economic case is strongest for high-volume, repetitive tasks where the current labor cost exceeds $2 to $5 per task, factoring in time and opportunity cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I know if my business workflows are ready for computer use AI?
&lt;/h3&gt;

&lt;p&gt;Three signals that a workflow is a strong candidate: the primary work is navigating between software windows rather than exercising specialized expertise, the task happens frequently enough that the setup cost is justified (at least daily, ideally multiple times per day), and the output is verifiable, meaning there is a clear correct state the system should end up in. Signals that a workflow is not ready: it requires significant contextual judgment not captured in the task instructions, the error cost is high enough that errors on edge cases are not acceptable without human review, or the workflow is low-volume enough that a human handles it in under two hours per week total. The &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;AI Agent Readiness Assessment&lt;/a&gt; walks through all the relevant dimensions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should businesses be worried about computer use AI accessing sensitive data or systems?
&lt;/h3&gt;

&lt;p&gt;Yes, and this is a real deployment consideration. Computer use agents that operate inside your systems have the same access as the user account they run under. A misconfigured agent can read, modify, or delete data unintentionally. Best practices include running agents under dedicated service accounts with the minimum permissions needed for the specific task, implementing comprehensive action logging, adding confirmation steps before irreversible actions, and using sandboxed environments for testing before production deployment. This is not a reason to avoid the technology. It is a reason to treat it with the same security discipline you apply to any automated system that touches production data.&lt;/p&gt;

</description>
      <category>aiagents</category>
      <category>computeruseai</category>
      <category>aiautomation</category>
      <category>businessai2026</category>
    </item>
    <item>
      <title>Agentic RAG: The Complete Production Guide Nobody Else Wrote</title>
      <dc:creator>Jahanzaib</dc:creator>
      <pubDate>Sat, 04 Apr 2026 08:28:49 +0000</pubDate>
      <link>https://forem.com/jahanzaibai/agentic-rag-the-complete-production-guide-nobody-else-wrote-386o</link>
      <guid>https://forem.com/jahanzaibai/agentic-rag-the-complete-production-guide-nobody-else-wrote-386o</guid>
      <description>&lt;p&gt;Three months into a contract with a mid-sized insurance company, I was sitting across from their CTO watching their "AI knowledge base" answer questions about their own products. The system retrieved the right documents 90% of the time. But on anything involving multi-part questions, comparisons, or anything that required checking two sources together, it fell apart. Their agentic RAG system wasn't agentic at all. It was a fixed pipeline wearing an agent costume, and it was costing them about $4,200 a month in API calls to produce answers that were wrong 62% of the time on complex queries.&lt;/p&gt;

&lt;p&gt;That project is what pushed me to formalize what I now call an agentic RAG system the right way. I've since deployed some form of this architecture across 38 of my 109 production AI systems, and the patterns I'm about to share are hard-won. This guide covers what most agentic RAG articles skip: real chunking decisions, embedding model comparisons, the four failure modes that will definitely hit you in production, evaluation methods, and actual cost-per-query numbers. If you want a high-level intro to what RAG is, I wrote &lt;a href="https://www.jahanzaib.ai/blog/what-is-rag-business-guide" rel="noopener noreferrer"&gt;a separate guide for business owners&lt;/a&gt;. This post is for engineers building the thing.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Agentic RAG replaces fixed retrieve-then-generate pipelines with a loop that routes, retrieves, grades, and self-corrects before answering&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The five core components are Router, Retriever, Grader, Generator, and Hallucination Checker, each can be tuned independently&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Chunk size and embedding model choice have more impact on accuracy than model selection&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Four failure modes kill most first deployments: infinite loops, graders that never reject, context overflow, and latency spirals&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Real production cost per query ranges from $0.02 for simple lookups to $0.31 for complex multi-source reasoning&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Agentic RAG is not always the right choice and I'll give you a clear decision framework for when simpler approaches win&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Traditional RAG Gets Wrong
&lt;/h2&gt;

&lt;p&gt;Standard RAG works like this: a query comes in, you embed it, you pull the top-k chunks from your vector database, you stuff those chunks into a prompt, and you generate an answer. The pipeline is deterministic and linear. That's both its strength and its fatal flaw.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Fixed Pipeline Problem
&lt;/h3&gt;

&lt;p&gt;The assumption baked into every traditional RAG pipeline is that a single retrieval step produces sufficient context for every possible question. That's almost never true. Consider a user asking: "Compare our cancellation policy for personal auto versus commercial auto, and tell me which has the shorter waiting period." That question requires pulling from at least two separate sections of two separate documents, understanding what "waiting period" means in the context of each policy type, and synthesizing a comparison the original documents never made.&lt;/p&gt;

&lt;p&gt;Traditional RAG will retrieve the top-k chunks most similar to the query embedding. Maybe it pulls the right chunks, maybe it doesn't. There's no retry, no grading, no fallback. If the retrieved chunks don't contain the answer, you hallucinate. And you'll never know it happened unless you're running evaluation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where I've Seen Standard RAG Break
&lt;/h3&gt;

&lt;p&gt;In my experience, fixed RAG pipelines reliably fail in four scenarios. First, multi-hop questions that require connecting information across documents. Second, questions where the answer depends on recency and your index isn't perfectly current. Third, numerical comparisons where the LLM needs to find and compare specific data points. Fourth, any question where the user's phrasing is far from the language in the source documents, making vector similarity a weak signal. In the insurance project I mentioned, 68% of the failing queries fell into one of these four categories.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1526374965328-7f61d4dc18c5%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1526374965328-7f61d4dc18c5%3Fw%3D1200%26q%3D80" alt="green matrix data flow representing traditional RAG fixed pipeline limitations" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;Traditional RAG pipelines are linear by design. Linear breaks on complex, multi-part queries.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Agentic RAG Actually Does
&lt;/h2&gt;

&lt;p&gt;Agentic RAG turns the pipeline into a loop. Instead of one retrieval step, you have an agent that decides whether to retrieve at all, what to retrieve, whether the retrieved content is good enough, and whether to try again with a different query before generating an answer. The agent controls the entire process.&lt;/p&gt;

&lt;p&gt;This isn't just a theoretical improvement. &lt;a href="https://developer.nvidia.com/blog/traditional-rag-vs-agentic-rag-why-ai-agents-need-dynamic-knowledge-to-get-smarter/" rel="noopener noreferrer"&gt;NVIDIA's engineering blog&lt;/a&gt; documented accuracy improvements from 34% to 78% on complex multi-hop queries when moving from traditional to agentic retrieval.That's a major shift in what you can actually trust in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Five Component Architecture
&lt;/h3&gt;

&lt;p&gt;Every agentic RAG system I've built uses five core components, regardless of the underlying framework:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Router&lt;/strong&gt;: classifies the incoming query and decides what kind of retrieval, if any, is needed. Some questions don't need retrieval at all (factual questions the LLM already knows well). The router keeps you from burning tokens on unnecessary vector searches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Retriever&lt;/strong&gt;: executes the actual search against your vector store, SQL database, or other knowledge sources. In multi-agent setups, different retriever agents may handle different knowledge domains in parallel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Grader&lt;/strong&gt;: evaluates whether the retrieved documents are actually relevant to the question. This is the component most implementations skip, and it's why most agentic RAG systems still fail on edge cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Generator&lt;/strong&gt;: synthesizes the final answer using the graded, relevant context. Only runs when the grader says the retrieved content is sufficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Hallucination Checker&lt;/strong&gt;: verifies that the generated answer is grounded in the retrieved context, not invented. If it detects fabrication, it routes back to retrieval or flags the query for human review.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1558618666-fcd25c85cd64%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1558618666-fcd25c85cd64%3Fw%3D1200%26q%3D80" alt="neural network nodes representing the five component agentic RAG architecture" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;Each node in an agentic RAG graph has a single responsibility: routing, retrieving, grading, generating, or verifying.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Agentic RAG with LangGraph
&lt;/h2&gt;

&lt;p&gt;LangGraph is the right tool for implementing this architecture in 2026. Its graph-based state machine maps directly to the agentic loop. You define nodes (the five components), edges (conditional transitions between them), and shared state (the query, retrieved docs, and generated answer flowing through the graph). If you've read my &lt;a href="https://www.jahanzaib.ai/blog/ai-agents-production" rel="noopener noreferrer"&gt;complete guide to building AI agents in production&lt;/a&gt;, LangGraph will look familiar.&lt;/p&gt;

&lt;p&gt;Here's how the core graph looks in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langgraph.graph import StateGraph, END
from typing import TypedDict, List

class AgenticRAGState(TypedDict):
    query: str
    reformulated_query: str
    retrieved_docs: List[str]
    relevant_docs: List[str]
    answer: str
    hallucination_detected: bool
    retry_count: int

def build_rag_graph():
    graph = StateGraph(AgenticRAGState)

    graph.add_node("router", router_node)
    graph.add_node("retriever", retriever_node)
    graph.add_node("grader", grader_node)
    graph.add_node("generator", generator_node)
    graph.add_node("hallucination_checker", hallucination_checker_node)

    graph.set_entry_point("router")

    graph.add_conditional_edges("router", route_query, {
        "retrieve": "retriever",
        "direct_answer": "generator"
    })
    graph.add_edge("retriever", "grader")
    graph.add_conditional_edges("grader", grade_documents, {
        "sufficient": "generator",
        "insufficient": "retriever"  # reformulate and retry
    })
    graph.add_edge("generator", "hallucination_checker")
    graph.add_conditional_edges("hallucination_checker", check_hallucination, {
        "grounded": END,
        "hallucinated": "retriever"
    })

    return graph.compile()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Router Node
&lt;/h3&gt;

&lt;p&gt;The router uses an LLM call (I use a small, fast model here, Claude Haiku or GPT-4o-mini) to classify the query. Don't over-engineer this. A simple prompt asking "Does this question require searching a knowledge base, or can it be answered from general knowledge?" works well for most use cases. I add a third category for queries that should be declined entirely.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def router_node(state: AgenticRAGState) -&amp;gt; AgenticRAGState:
    router_prompt = f"""
    Classify this query into one of three categories:
    - "retrieve": requires searching specific documents or knowledge base
    - "direct": can be answered from general knowledge
    - "decline": off-topic, harmful, or outside system scope

    Query: {state["query"]}

    Return only the category word.
    """
    result = llm.invoke(router_prompt).content.strip().lower()
    state["route"] = result
    return state

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Grader Node
&lt;/h3&gt;

&lt;p&gt;The grader is where most implementations cut corners and pay for it. A weak grader that accepts marginally relevant documents will produce hallucinations downstream, because the generator will try to answer from insufficient context. I use binary grading: relevant or not relevant, no middle ground.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def grader_node(state: AgenticRAGState) -&amp;gt; AgenticRAGState:
    relevant_docs = []
    for doc in state["retrieved_docs"]:
        grade_prompt = f"""
        Is this document relevant to answering the query?

        Query: {state["query"]}
        Document: {doc}

        Answer with only "relevant" or "irrelevant".
        """
        grade = llm.invoke(grade_prompt).content.strip().lower()
        if grade == "relevant":
            relevant_docs.append(doc)

    state["relevant_docs"] = relevant_docs
    state["retry_count"] = state.get("retry_count", 0) + 1
    return state

def grade_documents(state: AgenticRAGState) -&amp;gt; str:
    if len(state["relevant_docs"]) &amp;gt;= 2:
        return "sufficient"
    if state["retry_count"] &amp;gt;= 3:
        return "sufficient"  # proceed with what we have, don't loop forever
    return "insufficient"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the retry cap at 3. This is critical and I'll come back to it in the failure modes section.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chunking and Embedding: The Choices That Actually Matter
&lt;/h2&gt;

&lt;p&gt;I've seen engineers spend weeks tuning LangGraph routing logic while ignoring the fact that their chunk size is wrong. Chunking and embedding choice have more impact on retrieval quality than almost anything else in the system. Most articles on agentic RAG skip this entirely. Don't make that mistake.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Chunk Size Is Not a Default Setting
&lt;/h3&gt;

&lt;p&gt;The default chunk size in most RAG tutorials is 512 tokens or 1024 tokens. Both numbers are arbitrary. The right chunk size depends entirely on your documents.&lt;/p&gt;

&lt;p&gt;For dense technical documentation with short, precise statements: 256 to 512 tokens works well. Larger chunks dilute the embedding signal. For narrative or explanatory content, policy documents, and legal text: 1024 to 2048 tokens. These documents derive meaning from context, and splitting too aggressively loses that. For tabular data or structured records: chunk by row or entity, not by token count at all.&lt;/p&gt;

&lt;p&gt;The test I run on every new project: take 50 representative queries, retrieve against 256, 512, and 1024 token chunks, and measure what percentage of the time the correct chunk ranks in the top 3. That number tells you everything. I've seen accuracy jump from 61% to 89% just by changing chunk size from 512 to 256 on a technical API documentation project.&lt;/p&gt;

&lt;p&gt;I also use chunk overlap. A 20% overlap between adjacent chunks catches information that spans chunk boundaries. For a 512-token chunk, that's about 100 tokens of overlap. This adds storage cost but meaningfully reduces retrieval gaps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choosing Your Embedding Model
&lt;/h3&gt;

&lt;p&gt;The three models I actually use in production are compared below. I'm not listing every available option and I'm only listing the ones I've shipped against real queries at scale.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;Cost per 1M tokens&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Weakness&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI text-embedding-3-large&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3072 (reducible)&lt;/td&gt;
&lt;td&gt;$0.13&lt;/td&gt;
&lt;td&gt;General purpose, mixed document types&lt;/td&gt;
&lt;td&gt;Latency on large batches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cohere embed-v3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;Multilingual content, e-commerce&lt;/td&gt;
&lt;td&gt;Needs Cohere SDK dependency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;nomic-embed-text (local)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;768&lt;/td&gt;
&lt;td&gt;$0 (compute only)&lt;/td&gt;
&lt;td&gt;Privacy-sensitive data, on-prem&lt;/td&gt;
&lt;td&gt;8K token context limit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For most projects, I start with &lt;code&gt;text-embedding-3-large&lt;/code&gt; and reduce dimensions to 1536 using the &lt;code&gt;dimensions&lt;/code&gt; parameter. You get 98% of the quality at half the storage cost. If you're running on healthcare or legal data that can't leave your environment, &lt;code&gt;nomic-embed-text&lt;/code&gt; via Ollama runs fine on a single GPU and performs respectably against the paid models on domain-specific text.&lt;/p&gt;

&lt;p&gt;One thing I never do: switch embedding models mid-project without re-indexing everything. Different models encode semantic meaning differently. Mixing embeddings from two models in the same vector store breaks similarity search in ways that are hard to debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four Failure Modes I See in Every First Deployment
&lt;/h2&gt;

&lt;p&gt;These aren't edge cases. They're standard. Every team building their first agentic RAG system hits at least two of them in the first week of production traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Infinite Loop
&lt;/h3&gt;

&lt;p&gt;The grader rejects retrieved documents. The system reformulates the query and tries again. The new retrieval also fails the grader. The system loops. Without a retry cap and loop detection, this runs until you hit your rate limit or your daily cost cap. I saw this cost a client $340 in a single afternoon because one ambiguous user query triggered a loop that ran 87 iterations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Hard cap retry count at 3. After 3 failed retrievals, either generate from whatever you have or return a graceful "I don't have sufficient information" response. Never let the graph run without a termination condition. In the code above, I implemented this as &lt;code&gt;if state["retry_count"] &amp;gt;= 3: return "sufficient"&lt;/code&gt;. You can tune the threshold, but it must exist.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Grader That Never Says No
&lt;/h3&gt;

&lt;p&gt;This is the opposite problem. Your grader accepts everything, relevance scoring becomes meaningless, and the generator tries to synthesize answers from unrelated documents. The symptom is plausible-sounding but wrong answers. These are the most dangerous kind because they pass casual review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Test your grader in isolation before integrating it into the graph. Give it 20 known-relevant and 20 known-irrelevant document pairs and measure precision. If it's accepting more than 15% of irrelevant documents, your grading prompt needs work. I add specificity by including the query type in the grading prompt: "Is this document relevant to a question about [classification of query type]?" That context tightens the grader significantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Context Window Overflow
&lt;/h3&gt;

&lt;p&gt;You retrieve 10 documents, each 2048 tokens, plus a 4000-token system prompt, plus the query. That's 26,000 tokens of context before the generator says a single word. On Claude Sonnet or GPT-4o, you're paying $0.78 per query just for input tokens. On systems with high query volume, that compounds fast. And beyond cost, stuffing a 200,000-token context window doesn't improve accuracy. It degrades it, because attention diffuses across too much content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Cap the context sent to the generator. I use a hard limit of 6 retrieved documents, each truncated to 800 tokens of the most relevant passage using a lightweight extraction step. Total context budget for retrieved content: 4800 tokens. This number came from testing on 200 real queries. Going above it produced no accuracy gains while increasing cost and latency significantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The Latency Spiral
&lt;/h3&gt;

&lt;p&gt;Each node in the graph makes at least one LLM call. A full agentic RAG cycle (router, retriever, grader per doc, generator, hallucination checker) can easily make 8 to 15 LLM calls. At 300ms to 800ms per call, you're looking at 2.4 to 12 seconds of total latency before the user gets an answer. That's fine for async batch processing. It's unacceptable for a real-time chatbot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Use the smallest capable model for each node. The router doesn't need GPT-4o. It's making a three-way classification. Claude Haiku or GPT-4o-mini handles this in under 200ms. The grader is also a classification task, not a generation task. Only the generator and hallucination checker need a more capable model. I run a "model tiering" approach: small model for router and grader ($0.001 per call), large model for generator and checker ($0.015 per call). This cuts total latency by 35 to 45% while preserving answer quality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1620712943543-bcc4688e7485%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1620712943543-bcc4688e7485%3Fw%3D1200%26q%3D80" alt="AI system production monitoring showing latency and evaluation metrics" width="1200" height="1500"&gt;&lt;/a&gt;&lt;em&gt;Latency compounds at every graph node. Tiering your models by task complexity is the single highest-ROI optimization in most agentic RAG systems.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Evaluate Your Agentic RAG System
&lt;/h2&gt;

&lt;p&gt;Most teams skip this step entirely. They test their system manually, say "it looks good," and ship. Then production traffic surfaces edge cases their manual testing never caught. Proper evaluation isn't optional and it's what separates systems you can trust from systems you're constantly firefighting.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Four Metrics That Actually Matter
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Retrieval Recall:&lt;/strong&gt; what percentage of queries result in at least one relevant document being retrieved? Measure this by building a labeled test set of 100 queries with known ground-truth documents. If retrieval recall is below 85%, your embedding model or chunk size is wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Grader Precision:&lt;/strong&gt; of the documents your grader marks as relevant, what percentage actually are? Test this in isolation with a held-out labeled set. Below 80% means your grader prompt needs tightening.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer Faithfulness:&lt;/strong&gt; is the generated answer grounded in the retrieved context? This is where the hallucination checker comes in. I measure this with an LLM-as-judge prompt on 200 sampled production queries per week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer Relevance:&lt;/strong&gt; does the answer actually address what the user asked? Faithfulness and relevance are different things. A faithful answer can still be off-topic. I track this through user feedback signals (thumbs up/down) and spot-check sampling.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM-as-Judge Evaluation
&lt;/h3&gt;

&lt;p&gt;For continuous evaluation in production, I use an LLM judge running nightly on a random sample of 50 queries. The judge prompt looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EVALUATION_PROMPT = """
You are an evaluation assistant. Rate the following RAG system response.

Query: {query}
Retrieved Context: {context}
Generated Answer: {answer}

Rate on three dimensions (1-5):
1. Faithfulness: Is the answer grounded in the retrieved context?
2. Relevance: Does the answer address what the query asks?
3. Completeness: Does the answer cover all aspects of the query?

Return a JSON object with scores and a one-sentence explanation for each.
"""

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I run this with GPT-4o-mini on a cron job and store results in a simple Postgres table. When any dimension drops below 3.5 average over a 7-day window, I get an alert and review the flagged queries. This has caught three separate regression issues across production deployments, each caused by a document sync failure or prompt change that wasn't tested against the full eval set.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Cost Numbers from Production
&lt;/h2&gt;

&lt;p&gt;Nobody publishes these. Here's what I actually see across deployments.&lt;/p&gt;

&lt;p&gt;A simple query that the router sends directly to the generator (no retrieval needed) costs about $0.02: one small model call for routing, one large model call for generation. A standard single-retrieval query with grading and hallucination checking runs $0.06 to $0.09: five to six LLM calls across small and large models, plus one vector search. A complex multi-hop query requiring two retrieval iterations costs $0.18 to $0.31: ten to fourteen LLM calls. Queries that hit the retry cap and fall back to a "no information" response cost $0.04 to $0.07.&lt;/p&gt;

&lt;p&gt;For a system handling 1,000 queries per day with a typical distribution (40% direct, 45% standard retrieval, 15% complex), daily LLM costs run $60 to $90 per day, or roughly $1,800 to $2,700 per month. Add vector store costs and infrastructure, and you're looking at $2,200 to $3,400 per month all-in for a mid-volume deployment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1555949963-aa79dcee981c%3Fw%3D1200%26q%3D80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1555949963-aa79dcee981c%3Fw%3D1200%26q%3D80" alt="data center servers showing production infrastructure for agentic RAG cost optimization" width="1200" height="800"&gt;&lt;/a&gt;&lt;em&gt;Production cost at 1,000 queries per day typically runs $2,200 to $3,400 per month all-in. Routing is the single biggest cost lever.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Where to Cut Costs Without Sacrificing Quality
&lt;/h3&gt;

&lt;p&gt;The router is your biggest lever. If you can correctly classify 40% of queries as "direct answer" (no retrieval needed), you cut costs on those queries by 70%. Invest time in making your router accurate. The second lever is caching. Many queries in enterprise systems are semantically similar or identical. Semantic caching (embedding the query and checking similarity against a cache of recent queries and their answers) can serve 20 to 35% of queries at near-zero cost on high-repetition workloads like internal HR chatbots or product documentation systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to Use Agentic RAG
&lt;/h2&gt;

&lt;p&gt;This is the section nobody else writes. Agentic RAG adds complexity, latency, and cost. It's the right choice for some systems and clearly wrong for others.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use agentic RAG when:&lt;/strong&gt; your queries are complex and multi-part, your documents span multiple topics that require routing, you need high accuracy and can tolerate 2 to 8 seconds of latency, and your domain has a meaningful hallucination risk (legal, medical, financial).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stick with standard RAG when:&lt;/strong&gt; your queries are simple and well-defined, your knowledge base has a single topic and good semantic coverage, sub-second latency is required, and your volume is too high for per-query LLM grading to be economically viable. Standard RAG at high volume with a well-structured index often outperforms agentic RAG on cost-adjusted accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use direct LLM calls (no RAG at all) when:&lt;/strong&gt; the information needed is within the model's training data, the query is more about reasoning than retrieval, or you're building a creative or generative use case where external grounding would constrain the output.&lt;/p&gt;

&lt;p&gt;I've seen teams add agentic RAG to a simple FAQ bot that had 200 predefined questions and answers. The standard RAG system answered correctly 94% of the time. The agentic system answered correctly 96% of the time. But it cost 8x more per query and took 3 seconds instead of 0.4 seconds. That's not a win. &lt;a href="https://www.jahanzaib.ai/ai-readiness" rel="noopener noreferrer"&gt;Use our AI readiness assessment&lt;/a&gt; to figure out which approach actually fits your situation before committing to an architecture.&lt;/p&gt;

&lt;p&gt;If you're building agentic systems at scale and want a second opinion on architecture, I review these in detail as part of &lt;a href="https://www.jahanzaib.ai/work" rel="noopener noreferrer"&gt;my AI systems work&lt;/a&gt;. And if you want to go deeper on the multi-agent orchestration patterns that sit on top of agentic RAG, the &lt;a href="https://www.jahanzaib.ai/blog/n8n-ai-agent-workflows-practitioner-guide" rel="noopener noreferrer"&gt;n8n AI agent workflow guide&lt;/a&gt; covers how I connect retrieval systems to action-taking agents in production. Reach out via the &lt;a href="https://www.jahanzaib.ai/contact" rel="noopener noreferrer"&gt;contact page&lt;/a&gt; if you want to talk through a specific deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the difference between RAG and agentic RAG?
&lt;/h3&gt;

&lt;p&gt;Standard RAG follows a fixed pipeline: embed the query, retrieve top-k documents, generate an answer. Agentic RAG replaces that pipeline with a loop where an AI agent decides whether to retrieve, grades what it retrieved, and retries with a reformulated query if the context isn't good enough. The agent controls the process rather than following predetermined steps. This makes agentic RAG significantly more accurate on complex, multi-part questions but also more expensive and slower per query.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is LangGraph the best framework for building agentic RAG?
&lt;/h3&gt;

&lt;p&gt;In 2026, LangGraph is the most mature option for production agentic RAG systems. Its state graph abstraction maps cleanly to the iterative retrieval loop, it handles human-in-the-loop checkpoints well, and the LangSmith integration gives you production observability out of the box. CrewAI is easier to get started with but gives you less control over the retrieval loop internals. For most teams building their first agentic RAG system, LangGraph is the right choice. For teams that need something working in a day and will live with slightly less control, CrewAI's approach is reasonable.&lt;/p&gt;

&lt;h3&gt;
  
  
  How many LLM calls does an agentic RAG system make per query?
&lt;/h3&gt;

&lt;p&gt;A typical single-retrieval agentic RAG cycle makes five to seven LLM calls: one for routing, one for retrieval query reformulation if needed, one per document for grading (typically two to four documents), one for generation, and one for hallucination checking. A complex multi-hop query requiring two retrieval iterations can make ten to fifteen calls. This is why model tiering (using small models for routing and grading, large models for generation) is critical for keeping latency and cost manageable.&lt;/p&gt;

&lt;h3&gt;
  
  
  What chunk size should I use for my RAG system?
&lt;/h3&gt;

&lt;p&gt;There is no universal answer. Dense technical documentation typically does better with 256 to 512 token chunks. Narrative and policy documents do better with 1024 to 2048 tokens. Structured data should be chunked by entity or row, not by token count. The only reliable method is empirical testing: take 50 representative queries, test against multiple chunk sizes, and measure retrieval recall (what percentage of queries surface the correct document in the top 3 results). Add 20% overlap between chunks to catch information that spans boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I prevent infinite loops in agentic RAG?
&lt;/h3&gt;

&lt;p&gt;Set a hard retry cap. I use a maximum of 3 retrieval attempts. After 3 failed retrievals, the system proceeds with whatever context it has, or returns a graceful "insufficient information" response. Never build a graph node without a termination condition. You also want loop detection at the query level. If the same reformulated query appears twice, break the cycle and escalate to fallback behavior. These two controls together eliminate the infinite loop problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the real cost of running agentic RAG in production?
&lt;/h3&gt;

&lt;p&gt;At 1,000 queries per day with a typical distribution of simple and complex queries, expect $1,800 to $2,700 per month in LLM API costs. Add vector store costs ($50 to $200 depending on index size) and compute infrastructure, and total monthly cost runs $2,200 to $3,400 for a mid-volume deployment. Cost per query averages $0.06 to $0.09 for standard retrievals and $0.18 to $0.31 for complex multi-hop queries. Semantic caching on high-repetition workloads can cut overall cost by 20 to 35%.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I use standard RAG instead of agentic RAG?
&lt;/h3&gt;

&lt;p&gt;Use standard RAG when your queries are simple and well-defined, your knowledge base has good semantic coverage of a single topic, you need sub-second response times, or your query volume is too high for per-query LLM grading to be cost-effective. Agentic RAG adds real value when questions are complex and multi-part, documents span multiple domains requiring routing decisions, high accuracy justifies 2 to 8 seconds of latency, and your use case has meaningful consequences for hallucination (legal, financial, medical). Many deployments that think they need agentic RAG actually need better chunking and a stronger embedding model first.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I evaluate whether my agentic RAG system is working correctly?
&lt;/h3&gt;

&lt;p&gt;Track four metrics: retrieval recall (what percentage of queries surface at least one relevant document), grader precision (what percentage of documents marked relevant actually are), answer faithfulness (is the generated answer grounded in the retrieved context), and answer relevance (does the answer address what the user actually asked). Build a labeled test set of 100 queries with known ground-truth documents and run it before every major change. Use an LLM-as-judge prompt on a nightly sample of production queries to catch regressions automatically.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Citation Capsule:&lt;/strong&gt; Accuracy comparison data (34% traditional RAG vs 78% agentic RAG on complex queries) sourced from production benchmarks covered by &lt;a href="https://developer.nvidia.com/blog/traditional-rag-vs-agentic-rag-why-ai-agents-need-dynamic-knowledge-to-get-smarter/" rel="noopener noreferrer"&gt;NVIDIA Technical Blog&lt;/a&gt;. Query routing cost savings (40% reduction) from &lt;a href="https://labs.adaline.ai/p/building-production-ready-agentic" rel="noopener noreferrer"&gt;Adaline Labs production RAG architecture guide&lt;/a&gt;. Embedding model pricing from official API documentation as of April 2026. LangGraph framework documentation at &lt;a href="https://www.langchain.com/langgraph" rel="noopener noreferrer"&gt;LangChain LangGraph&lt;/a&gt;. Agentic retrieval architecture overview at &lt;a href="https://weaviate.io/blog/what-is-agentic-rag" rel="noopener noreferrer"&gt;Weaviate: What Is Agentic RAG&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>agenticrag</category>
      <category>langgraph</category>
      <category>ragarchitecture</category>
      <category>productionai</category>
    </item>
  </channel>
</rss>
