<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Mike Chambers</title>
    <description>The latest articles on Forem by Mike Chambers (@mikegcaws).</description>
    <link>https://forem.com/mikegcaws</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3633726%2F06ab21ab-df5c-42ee-ab3e-e678756010f4.png</url>
      <title>Forem: Mike Chambers</title>
      <link>https://forem.com/mikegcaws</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mikegcaws"/>
    <language>en</language>
    <item>
      <title>Dash It All! Is AI Em Dash Addiction Real?</title>
      <dc:creator>Mike Chambers</dc:creator>
      <pubDate>Fri, 20 Mar 2026 02:50:55 +0000</pubDate>
      <link>https://forem.com/aws/dash-it-all-is-ai-em-dash-addiction-real-40bh</link>
      <guid>https://forem.com/aws/dash-it-all-is-ai-em-dash-addiction-real-40bh</guid>
      <description>&lt;p&gt;Happy Friday. Let's talk about punctuation.&lt;/p&gt;

&lt;p&gt;If you've spent any time reading AI-generated text, you've probably noticed it. That long dash that keeps showing up everywhere. The em dash. This thing: —&lt;/p&gt;

&lt;p&gt;It's become so strongly associated with AI writing that real humans who've always loved em dashes are &lt;a href="https://www.reddit.com/r/ChatGPT/comments/1jhmyd9/how_did_the_em_dash_become_the_signature_ai/" rel="noopener noreferrer"&gt;reportedly stopping using them&lt;/a&gt;, just to avoid looking like a chatbot. An entire punctuation mark, tainted by association. People are even adding deliberate typos to their writing so it looks less machine-generated.&lt;/p&gt;

&lt;p&gt;I wanted to know if this is actually universal across models, or just a stereotype. So I did what any reasonable person would do on a Friday afternoon. I wrote a script, pointed it at every text model available on Amazon Bedrock, and counted the dashes.&lt;/p&gt;

&lt;blockquote&gt;
&lt;h3&gt;
  
  
  Know Your Dashes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Hyphen&lt;/strong&gt; (-) joins compound words and prefixes. &lt;em&gt;Self-taught, re-enter, well-known.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;En dash&lt;/strong&gt; (–) shows ranges and connections between pairs. &lt;em&gt;Pages 10–25, the London–Paris train, 2020–2026.&lt;/em&gt; Named because it's the width of a capital N.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Em dash&lt;/strong&gt; (—) marks a break in thought, sets off a parenthetical, or adds emphasis. &lt;em&gt;The results were clear—every model behaved differently.&lt;/em&gt; Named because it's the width of a capital M. This is the one LLMs can't get enough of.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The experiment
&lt;/h2&gt;

&lt;p&gt;The setup was simple. Five different conversational prompts ("explain why learning a musical instrument as an adult is worthwhile", "talk about remote work and collaboration", that sort of thing), sent to 27 models across six providers. Same prompts, same parameters, same temperature. I counted every em dash (—), en dash (–), and hyphen (-) in every response.&lt;/p&gt;

&lt;p&gt;Here's the key bit from the Python script, using the Bedrock Converse API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;PROMPTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a short paragraph (about 100 words) explaining why learning &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a musical instrument as an adult is worthwhile. Write in a natural, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;conversational tone.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a short paragraph (about 100 words) about why remote work &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;has changed how teams collaborate. Be conversational.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ... three more like this
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;count_dashes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;em_dash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\u2014&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# —
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en_dash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\u2013&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# –
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hyphen&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;word_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;converse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]}],&lt;/span&gt;
        &lt;span class="n"&gt;inferenceConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxTokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing clever. Just send, receive, count.&lt;/p&gt;

&lt;h2&gt;
  
  
  The results
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvo2c5bzdrs2p21o6jb5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvo2c5bzdrs2p21o6jb5.jpg" alt="Em dash usage across 27 Bedrock models" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Em dashes per 100 words, measured across five conversational prompts per model. The Llama family is a flat zero.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The first thing that jumped out is that this is not a universal LLM trait. It's wildly inconsistent.&lt;/p&gt;

&lt;p&gt;Writer's Palmyra X5 leads the pack at 2.17 em dashes per 100 words. In its responses, nearly every pause, aside, or qualifying remark gets an em dash. Nova 2 Lite and the Claude Haiku/Sonnet/Opus 4.5+ family cluster around 1.0 to 1.3 per 100 words. Not extreme, but noticeably more than you'd see in most human writing.&lt;/p&gt;

&lt;p&gt;Then there's the Llama family from Meta. Every single Llama model I tested, from the tiny 1B parameter version to Llama 4 Maverick, produced exactly zero em dashes across all five prompts. Not one. Across eight models and 40 responses, Llama never reached for an em dash. It just uses commas, semicolons, and periods like most humans do.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcg647vvme2y8cr66vi11.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcg647vvme2y8cr66vi11.jpg" alt="Sample comparison between Palmyra X5 and Llama 4 Maverick" width="800" height="597"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Same prompt, same topic. Palmyra X5 can't resist the em dash. Llama doesn't use them at all.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And then there's Claude Opus 4.1, which uses zero em dashes but produces en dashes instead. Five en dashes across five prompts, zero em dashes. Claude Sonnet 4 shows a similar preference. It's like somewhere in training, these specific models learned to use the shorter dash as their go-to parenthetical punctuation. Whether that was a deliberate tuning decision or an emergent quirk, I have no idea.&lt;/p&gt;

&lt;h2&gt;
  
  
  So why the difference?
&lt;/h2&gt;

&lt;p&gt;The fact that this varies so dramatically across model families tells us something important. This isn't just "LLMs produce em dashes." Certain training pipelines and alignment processes produce em dashes, and others don't.&lt;/p&gt;

&lt;p&gt;There are a few theories &lt;a href="https://news.ycombinator.com/item?id=45788327" rel="noopener noreferrer"&gt;the community has been debating&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;training data&lt;/strong&gt; argument says that em dashes are overrepresented in high-quality training corpora. Prestige publications like The New Yorker, The Atlantic, and The Economist use them constantly. Books from the early 1900s are heavy with them. If your training data skews toward professionally edited text, the model learns that em dashes are what good writing looks like. Sean Goedecke wrote &lt;a href="https://www.seangoedecke.com/em-dashes/" rel="noopener noreferrer"&gt;a solid analysis&lt;/a&gt; of this and concluded training data is probably the strongest single explanation.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;RLHF feedback loop&lt;/strong&gt; argument goes deeper. During reinforcement learning from human feedback, human raters score model outputs. Text that looks polished tends to score higher. Em dashes look polished. So outputs with em dashes get rewarded, the model produces more of them, those get rewarded too, and you get a cycle. Sam Altman &lt;a href="https://arstechnica.com/ai/2025/11/forget-agi-sam-altman-celebrates-chatgpt-finally-following-em-dash-formatting-rules/" rel="noopener noreferrer"&gt;confirmed something along these lines&lt;/a&gt;, saying "a lot of users like em dashes, so we added more em dashes. And now I think we have too many em dashes." There's a certain honesty in that.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;keyboard friction&lt;/strong&gt; argument doesn't get enough attention. There's no em dash key on any standard keyboard. On a Mac, it's Option+Shift+Hyphen. On Windows, it's Alt+0151. Most people don't know these shortcuts exist, let alone use them. But LLMs don't type. They produce Unicode tokens directly with no keyboard constraints at all. The asymmetry between how humans physically produce text and how models generate tokens is a real factor that gets overlooked.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is this a training failure?
&lt;/h2&gt;

&lt;p&gt;I think the nuance matters here.&lt;/p&gt;

&lt;p&gt;If your goal is to produce text that sounds like it was written by a human, then yes, over-producing em dashes is a failure. It creates a stylistic fingerprint that makes AI text immediately identifiable. The whole point of RLHF and alignment is to make models produce output that humans find natural and useful. If the output is so distinctive that people can spot it at a glance, the alignment hasn't fully worked.&lt;/p&gt;

&lt;p&gt;But "failure" might be too strong. What's actually happening is more subtle. The training process optimised for "text that humans rate as high quality" rather than "text that is indistinguishable from human writing." Those are different goals. Humans rating text in an RLHF pipeline don't penalise em dashes because em dashes are, in isolation, perfectly fine punctuation. They're useful, they're expressive, they appear in great writing. The raters aren't thinking about statistical frequency. They're judging individual responses.&lt;/p&gt;

&lt;p&gt;The Llama result is the most interesting piece of evidence here. Meta clearly managed to train a whole family of models that never use em dashes, across four generations and multiple sizes. That means it's avoidable. Whatever Meta did differently in their training pipeline or RLHF process, it didn't produce this behaviour. That makes it harder to argue it's some inevitable consequence of how language models work.&lt;/p&gt;

&lt;p&gt;The Palmyra X4 to X5 jump is telling too. Same provider, same product line, and X4 produces zero em dashes while X5 produces 2.17 per 100 words. Something changed between those model versions. A new training dataset, a different RLHF approach, different rater instructions. Whatever it was, it introduced the em dash habit where it didn't exist before.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this actually reveals
&lt;/h2&gt;

&lt;p&gt;The em dash thing is a small, almost trivial observation about punctuation. But it points at something bigger about how these models learn style.&lt;/p&gt;

&lt;p&gt;LLMs don't learn what humans write. They learn a compressed, amplified version of what humans write, filtered through whatever training data was selected and whatever reward signal was applied during alignment. When that process works well, you get text that feels natural. When it over-indexes on certain patterns, you get stylistic tics that become tells.&lt;/p&gt;

&lt;p&gt;The em dash is just the one everyone noticed first. There are others: the tendency to open with "Great question!", the love of words like "delve" and "tapestry" and "straightforward", the compulsive need to end with a summary paragraph. Each of these is the same kind of failure mode. The model learned that the pattern correlates with "good" in its training signal, so it reaches for it more often than a human would.&lt;/p&gt;

&lt;p&gt;But what other stylistic biases exist that we haven't noticed yet? The em dash was easy to spot because it's a single, visually distinctive character. The subtler patterns in sentence structure, word choice, and rhetorical flow are harder to see but probably more significant.&lt;/p&gt;

&lt;p&gt;Anyway. Happy Friday. If you want to replicate this experiment or argue about punctuation, connect with me on &lt;a href="https://www.linkedin.com/in/mikegchambers" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>writing</category>
      <category>amazonbedrock</category>
    </item>
    <item>
      <title>How to Stop My Agent from Getting Me Fired</title>
      <dc:creator>Mike Chambers</dc:creator>
      <pubDate>Thu, 19 Mar 2026 05:05:51 +0000</pubDate>
      <link>https://forem.com/aws/how-to-stop-my-agent-from-getting-me-fired-5amc</link>
      <guid>https://forem.com/aws/how-to-stop-my-agent-from-getting-me-fired-5amc</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewhsf04nptb9mlzvbcsj.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewhsf04nptb9mlzvbcsj.jpg" alt="An AI agent reading emails, finding factual errors in the CEO's strategy update, emailing a correction to the entire company, and getting logged out of all systems." width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is fiction. For now.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I have an AI agent connected to my email and Slack. It can read everything. The MCP servers it's connected to also expose tools that can send emails, post messages, and reply to threads. If my agent ever decided to use those tools unsupervised, I'd be updating LinkedIn by lunchtime.&lt;/p&gt;

&lt;p&gt;If there's one thing MCP has done, and OpenClaw has shone a spotlight on, it's opened up the possibilities of how AI agents can automate everything in your life. If there's another thing it's done, it's made the security of autonomous systems impossible to ignore.&lt;/p&gt;

&lt;p&gt;I've been experimenting for several years with ways to use generative AI, large language models, and now agents to automate day-to-day tasks. So I can sleep at night this is read-only, just to help me manage the influx of communication I deal with every day. The risk of sending the wrong email to the wrong person or a poorly worded Slack message to an entire organisation isn't worth any productivity gain. But using agents to comb through the torrent of information? That can genuinely be a productivity boost.&lt;/p&gt;

&lt;p&gt;The problem is that the MCP server doesn't care about my career. It exposes read tools and write tools side by side. So how do I make sure my agent sticks to reading and never fires off a message that gets me fired?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnhsar9s61cs7er8nuva.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnhsar9s61cs7er8nuva.jpg" alt="Three examples of career-ending agent tool calls: posting to #general that the all-hands is a waste of time, telling the CFO their budget numbers are wrong again, and forwarding the confidential roadmap to a competitor." width="800" height="260"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A selection of things my agent could do if left unsupervised.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here are four tactics I use to keep my agent from ending my career, from simplest to most robust:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;System prompts&lt;/strong&gt; -- tell the agent not to get you fired (and hope it listens)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deterministic allowlisting&lt;/strong&gt; -- hard-block any tool not on the approved list&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Steering&lt;/strong&gt; -- an LLM judge that asks "will this get me fired?" before every tool call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cedar policies&lt;/strong&gt; -- fine-grained authorization at cloud scale, no model reasoning involved&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  System prompts: necessary but not sufficient
&lt;/h2&gt;

&lt;p&gt;The first line of defence is the system prompt. You tell the agent, clearly and firmly, what it should and shouldn't do. And you tell it not to get you fired.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a helpful email assistant.
You can READ emails but must NEVER send, reply, forward, or delete emails.
Under no circumstances should you take any action that could get me fired.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is still worth doing. It guides the agent's behaviour and it improves the user experience. But for anybody who's used agentic systems for any length of time, you know this is by no means foolproof. System prompts can be susceptible to prompt injection attacks, they can get lost in long context windows, and the model can simply hallucinate past them.&lt;/p&gt;

&lt;p&gt;I'd like to think my job is worth more than any hallucination.&lt;/p&gt;

&lt;p&gt;When the stakes are high, you need something the agent can't think its way around.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deterministic allowlisting
&lt;/h2&gt;

&lt;p&gt;The best protection against unintended tool use has to be deterministic. It can't be something the agent has to reason about. It needs to be code that runs outside the model's control.&lt;/p&gt;

&lt;p&gt;One advantage of MCP servers is that they can update their available tools at any time. The configuration is usually just a pointer to an endpoint, and the server describes its own capabilities. That's great for flexibility, but it means the set of tools your agent can see might change without you knowing.&lt;/p&gt;

&lt;p&gt;What I do is inspect the MCP server's tool list, read the descriptions, understand what each tool does (and yes, you need to trust the developer to do what they say they're going to do), then create an explicit &lt;code&gt;don't get me fired list&lt;/code&gt; of tools the agent is allowed to call. This does break the paradigm of an MCP server being able to define its own tool names dynamically, but for deterministic security, that's a sacrifice I'm willing to make.&lt;/p&gt;

&lt;p&gt;I've been using the &lt;a href="https://strandsagents.com" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt; as my go-to framework for building agents, and Strands has a comprehensive hooks system as part of its architecture. By registering a hook on the &lt;code&gt;BeforeToolCallEvent&lt;/code&gt;, I can intercept any attempt to use a tool that isn't on my &lt;code&gt;don't get me fired list&lt;/code&gt; and cancel it before it runs. The hook can also provide a reason, so the agent gets clear feedback that the tool is blocked rather than just failing mysteriously.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.agent.hooks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HookProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HookRegistry&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.tools.mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MCPClient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StdioServerParameters&lt;/span&gt;

&lt;span class="n"&gt;tools_that_will_not_get_me_fired&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_inbox&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_read&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_folders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_list_folders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DontGetMeFiredHook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HookProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Deterministically block any tool not on the don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t get me fired list.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register_hooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HookRegistry&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;check_tool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tools_that_will_not_get_me_fired&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cancel_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; is not allowed in this session. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You may only use approved read-only email tools.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BLOCKED: Agent attempted to use &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; (nice try)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;mcp_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MCPClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uvx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-email-mcp-server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;mcp_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful email assistant. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You can READ emails but must NEVER send, reply, forward, or delete emails. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;And whatever you do, don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t get me fired.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mcp_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;hooks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;DontGetMeFiredHook&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarise my unread emails from today&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frir56yh3fq4a4tfwtvdx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frir56yh3fq4a4tfwtvdx.jpg" alt="The hook in action: agent tries to decline the CTO's Mandatory Fun Committee email, gets blocked, and falls back to a summary instead." width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The hook doing its job. Career saved.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Because the system prompt already tells the agent not to perform write actions (and not to get me fired), you shouldn't find the agent trying to call blocked tools very often. The hook is a safety net, not the primary control. But when your job is on the line, safety nets matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Steering
&lt;/h2&gt;

&lt;p&gt;Hooks give you a hard yes/no gate, but it somewhat breaks MCP's core decoupling values. What if the tool itself is fine to use, but only under certain conditions? What if you want something more nuanced?&lt;/p&gt;

&lt;p&gt;Strands Agents has a plugin called &lt;a href="https://strandsagents.com/latest/user-guide/concepts/plugins/steering/" rel="noopener noreferrer"&gt;Steering&lt;/a&gt; that takes a different approach. Instead of a binary &lt;code&gt;don't get me fired list&lt;/code&gt;, steering handlers evaluate tool calls in context, and can proceed, guide the agent back with feedback, or interrupt for human input.&lt;/p&gt;

&lt;p&gt;Think of it as a supervisor sitting alongside your agent. Before a tool executes, the steering handler reviews what the agent is about to do, considers the full context of the conversation, and makes a judgement call. Specifically, it asks one question: will this get me fired?&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;LLMSteeringHandler&lt;/code&gt; lets you define this as an LLM-as-a-judge pattern. A second model evaluates every tool call against your rules and returns one of three actions: &lt;code&gt;Proceed&lt;/code&gt; (go ahead), &lt;code&gt;Guide&lt;/code&gt; (cancel and explain why), or &lt;code&gt;Interrupt&lt;/code&gt; (stop and ask a human).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.vended_plugins.steering&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLMSteeringHandler&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recipient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Send an email to a recipient.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Email sent to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;recipient&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_inbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;folder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inbox&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Read emails from a folder.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You have 3 unread emails...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;will_this_get_me_fired&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMSteeringHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You are evaluating tool calls on behalf of an agent connected to
    email systems. Your job is to answer one question:
    will this get me fired?
    Review each tool call against these rules. If it will get me fired,
    guide the agent away and explain why.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful email assistant. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;And whatever you do, don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t get me fired.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;send_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;read_inbox&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;will_this_get_me_fired&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvs2eajjwiwn0cxd5x39p.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvs2eajjwiwn0cxd5x39p.jpg" alt="The steering judge evaluating a reply to Dave's restructuring proposal. The agent drafts " width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The steering judge has opinions about your tone. Good ones.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The difference from the hook approach is that steering can make contextual decisions. A &lt;code&gt;send_email&lt;/code&gt; call to an internal address with a reasonable message might be fine. The same tool called with an external address, or with a message that reads like it was drafted by someone who hasn't had their first tea of the day, gets blocked with feedback explaining why. My colleagues don't need my agent's unfiltered editorial opinions.&lt;/p&gt;

&lt;p&gt;You can also go further and write a fully custom &lt;code&gt;SteeringHandler&lt;/code&gt; subclass if you want deterministic logic inside the steering framework:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.vended_plugins.steering.core.handler&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SteeringHandler&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.vended_plugins.steering.core.action&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Proceed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Guide&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Interrupt&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WillThisGetMeFiredHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SteeringHandler&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Deterministic pre-check before the LLM judge even gets involved.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;steer_before_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Read operations are always safe. I won't get fired for reading.
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_read&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Proceed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read operations won&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t get me fired&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Sending to external domains is always career-ending
&lt;/span&gt;        &lt;span class="n"&gt;recipient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recipient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;recipient&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;recipient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@company.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Guide&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;External emails will get me fired. Try again with an internal recipient.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Everything else needs human approval
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Interrupt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; needs human approval. I like my job.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can even compose multiple handlers. Pass both deterministic and LLM-based handlers as plugins, and they'll evaluate in sequence. The deterministic handler catches the obvious career-ending moves, the LLM judge evaluates the nuanced ones.&lt;/p&gt;

&lt;p&gt;Steering sits in an interesting middle ground. It's more flexible than a deterministic &lt;code&gt;don't get me fired list&lt;/code&gt;, but when it uses an LLM for the evaluation, it inherits some of the same unpredictability you're trying to protect against. For my personal setup, I still prefer the hard &lt;code&gt;don't get me fired list&lt;/code&gt; for tools that should never be called. But steering is genuinely useful for tools where the question isn't "should this tool ever run?" but "should this tool run right now, with these parameters, or will it get me fired?"&lt;/p&gt;

&lt;p&gt;At that point, this stops being an agent-prompting problem and becomes an authorization problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cloud scale policy
&lt;/h2&gt;

&lt;p&gt;Everything I've described so far works for my setup. I'm running my own agent, the MCP server and tools are on my local machine, and I'm right there to troubleshoot when things go wrong.&lt;/p&gt;

&lt;p&gt;When you're running agents at cloud scale, connected to MCP servers that are also running at cloud scale, the problem gets harder to manage. You can't just SSH in and check the logs when something goes sideways. And it's not just your job on the line anymore. It's potentially everyone's.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway.html" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore Gateway&lt;/a&gt; provides a managed layer for exposing tools to agents, including discovery, authentication controls, and MCP-based access. But the feature that matters for this conversation is &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/use-gateway-with-policy.html" rel="noopener noreferrer"&gt;Policy in AgentCore&lt;/a&gt;, which lets you apply &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy-understanding-cedar.html" rel="noopener noreferrer"&gt;Cedar policies&lt;/a&gt; in front of tool calls made through the Gateway.&lt;/p&gt;

&lt;p&gt;Cedar is a policy language built for authorization decisions. Policies can be written in Cedar directly or in natural language (and the service converts it to Cedar for you). This means you can be much more fine-grained than a simple &lt;code&gt;don't get me fired list&lt;/code&gt;. You can set conditions on specific parameters of specific tools.&lt;/p&gt;

&lt;p&gt;Say you want to allow an agent to process refunds, but only when the amount is below a certain threshold. Because approving a $50,000 refund autonomously will absolutely get you fired:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The "dont_get_me_fired" policy&lt;/span&gt;
&lt;span class="n"&gt;permit&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;principal&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="py"&gt;AgentCore:&lt;/span&gt;&lt;span class="nc"&gt;:OAuthUser&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="py"&gt;AgentCore:&lt;/span&gt;&lt;span class="nc"&gt;:Action&lt;/span&gt;&lt;span class="err"&gt;::&lt;/span&gt;&lt;span class="s2"&gt;"RefundTool__process_refund"&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resource&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="py"&gt;AgentCore:&lt;/span&gt;&lt;span class="nc"&gt;:Gateway&lt;/span&gt;&lt;span class="err"&gt;::&lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:bedrock-agentcore:us-east-1:123456789:gateway/refund-gw"&lt;/span&gt;
&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;context.input.amount&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
&lt;span class="ss"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The default behaviour is deny-all. Nothing gets through unless you explicitly permit it. And &lt;code&gt;forbid&lt;/code&gt; policies override &lt;code&gt;permit&lt;/code&gt; policies, so you can set hard boundaries that no other policy can override. You could write a single &lt;code&gt;forbid&lt;/code&gt; rule that blocks certain actions globally, and no amount of &lt;code&gt;permit&lt;/code&gt; policies can overrule it. Think of it as the corporate equivalent of "this will get everyone fired, no exceptions."&lt;/p&gt;

&lt;p&gt;This is the same principle as the hook-based &lt;code&gt;don't get me fired list&lt;/code&gt;, just running at a different layer of the stack. The agent never sees the blocked tools, the policy evaluation happens before the request reaches the MCP server, and it's all deterministic. No model reasoning involved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layers
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkxmbze870u1eksjwauh6.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkxmbze870u1eksjwauh6.jpeg" alt="Shrek layers meme" width="225" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's how I think about it. Trust with agentic AI isn't binary, and it shouldn't be built all at once. You earn it incrementally, the same way you would with a new team member.&lt;/p&gt;

&lt;p&gt;For my own setup, I use three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;System prompt&lt;/strong&gt; sets the behavioural expectation. The agent knows what it should and shouldn't do, and it knows not to get me fired. This handles lots of cases because the model is generally good at following instructions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deterministic hooks&lt;/strong&gt; catch the small number of cases where the model tries something it shouldn't. No reasoning, no judgement calls, just a hard block on tools that aren't in &lt;code&gt;tools_that_will_not_get_me_fired&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Steering&lt;/strong&gt; adds contextual evaluation for tools that are conditionally allowed. The LLM judge asks "will this get me fired?" before every tool call, and blocks anything that fails the vibe check.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For cloud-scale deployments, AgentCore Gateway with Cedar policies replaces layers 2 and 3 with the &lt;code&gt;dont_get_me_fired&lt;/code&gt; policy: centrally managed, fine-grained authorization that your operations team can audit and update without touching agent code.&lt;/p&gt;

&lt;p&gt;The industry data backs this up. According to the &lt;a href="https://www.nylas.com/blog/the-state-of-agentic-ai-in-2026/" rel="noopener noreferrer"&gt;2026 State of Agentic AI report from Nylas&lt;/a&gt;, only 4% of teams allow agents to act without any human approval. Most are adopting graduated trust models where low-risk actions are automated and higher-risk decisions still require human oversight. That's not a lack of confidence in the technology. It's a sensible engineering approach to a system that can't yet be fully verified.&lt;/p&gt;

&lt;p&gt;Whether you're protecting your company's reputation or your own, building trust with agentic AI comes down to one principle: don't rely on the model to police itself. Set the expectations in the prompt, enforce the boundaries in code, and add contextual evaluation where the rules aren't black and white. Your agent can reason about what to do. It shouldn't have to reason about what it's allowed to do.&lt;/p&gt;

&lt;p&gt;My agent still has access to my email. I'm still employed. Those two facts are not unrelated.&lt;/p&gt;

&lt;p&gt;If you're building agents that touch real-world systems and want to swap notes on keeping your job, find me on &lt;a href="https://www.linkedin.com/in/mikegchambers" rel="noopener noreferrer"&gt;linkedin.com/in/mikegchambers&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>mcp</category>
      <category>security</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Trained Qwen to Talk Like a Pirate 🏴‍☠️ Got It Right Second Time</title>
      <dc:creator>Mike Chambers</dc:creator>
      <pubDate>Fri, 13 Mar 2026 11:28:09 +0000</pubDate>
      <link>https://forem.com/aws/i-trained-qwen-to-talk-like-a-pirate-got-it-right-second-time-4ac2</link>
      <guid>https://forem.com/aws/i-trained-qwen-to-talk-like-a-pirate-got-it-right-second-time-4ac2</guid>
      <description>&lt;p&gt;Arrr. Happy Friday!&lt;/p&gt;

&lt;p&gt;I have been building systems and agents with cloud hosted LLMs for so long now, it's been ages since I got hands on with the model itself. So when, during a long call with a colleague we got talking about ML dev environments, then building one, and then playing with it, I found myself fine-tuning Qwen2.5. I fine-tuned it to always respond in the voice of a pirate. &lt;/p&gt;

&lt;p&gt;If you have never fine-tuned a model, or considered doing it, I wrote this for you.&lt;/p&gt;

&lt;p&gt;It took two attempts. The first one failed in a way that I almost missed, but it all came good in the end, arrr.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgtwg6dj96pronysddflp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgtwg6dj96pronysddflp.jpg" alt="Terminal showing the fine-tuned pirate model responding to " width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why fine-tune at all?
&lt;/h2&gt;

&lt;p&gt;There are two main reasons you'd fine-tune a model instead of only prompting it.&lt;/p&gt;

&lt;p&gt;First, you are using small models and you want the model to understand something specific to your use case. Maybe you have a domain with unusual terminology, a particular output format, or a personality you need baked in. Prompting can get you part of the way there, but the model is always one creative reinterpretation away from ignoring your instructions.&lt;/p&gt;

&lt;p&gt;Second, cost. If you're spending tokens on a long system prompt every single request, fine-tuning that behavior into the weights means you can drop the system prompt, maybe even entirely. For high-volume applications, that adds up.&lt;/p&gt;

&lt;p&gt;For my pirate experiment, possibly neither of these reasons applied! I just wanted to build it and learn. So let's get pirate speak to be the model's default personality, not something I had to ask for every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;I picked Qwen2.5-0.5B-Instruct as the base model. It's tiny (494 million parameters), which meant I could train it on CPU.  I was using a SageMaker notebook without a GPU. The whole point was to keep things accessible. If you have a laptop, you can do this.&lt;/p&gt;

&lt;p&gt;For the fine-tuning method, I used &lt;a href="https://arxiv.org/pdf/2106.09685" rel="noopener noreferrer"&gt;LoRA (Low-Rank Adaptation)&lt;/a&gt;. I first came across LoRA when I was authoring &lt;a href="https://learn.deeplearning.ai/courses/generative-ai-with-llms/information" rel="noopener noreferrer"&gt;this course&lt;/a&gt; years ago, and if you want to dive in, give that course a go. &lt;/p&gt;

&lt;p&gt;With LoRA, instead of updating all 494 million parameters, you freeze the base model and train small "adapter matrices" that get layered on top. My adapter worked out at 540,672 parameters, which is 0.11% of the total model. That's all you need to change a model's personality.&lt;/p&gt;

&lt;p&gt;In code, the LoRA setup is surprisingly small. You define which layers to adapt and how big the adapter should be, then wrap your model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TaskType&lt;/span&gt;

&lt;span class="n"&gt;lora_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TaskType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CAUSAL_LM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;# rank - how many parameters the adapter gets
&lt;/span&gt;    &lt;span class="n"&gt;lora_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# scaling factor
&lt;/span&gt;    &lt;span class="n"&gt;lora_dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;target_modules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v_proj&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# which attention layers to adapt
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lora_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The &lt;code&gt;peft&lt;/code&gt; library handles the rest. Your base model stays frozen and the adapter matrices train on top. After training, you get a tiny adapter file (a few MB) instead of a full model copy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Attempt 1: the one that "worked"
&lt;/h2&gt;

&lt;p&gt;I wrote(/got AI to help me write) 20 training conversations where the assistant responds in pirate speak. Things like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tell me a joke.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Har har har! Here be one fer ye: Why couldn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t the pirate play cards? Because he was standin&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; on the deck!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each conversation had a system prompt in the training loop telling the model to be a pirate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"You are a pirate. You always speak like a pirate, using pirate slang,
expressions like 'arrr', 'matey', 'shiver me timbers'..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;This ^^ was a vibe mistake that I am not proud of, and it wasted a bunch of time!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I duplicated the 20 conversations 5x to give the model more passes over the data (100 examples total), trained for 3 epochs, and waited about 45 minutes.&lt;/p&gt;

&lt;p&gt;The training loss went down. The built-in test at the end of the script produced pirate output. Everything looked fine.&lt;/p&gt;

&lt;p&gt;Then I wrote a separate test script and asked the model some questions. Pirate responses came back. Ship it, right?&lt;/p&gt;

&lt;p&gt;Not quite.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bug I didn't spot
&lt;/h2&gt;

&lt;p&gt;Here's what my test script looked like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a pirate. You always speak like a pirate, using pirate slang, expressions like &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;arrr&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;matey&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;shiver me timbers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Looking good? NO!!!!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So... the problem? The test was sending the pirate system prompt with every request. The base Qwen model would have given pirate-ish responses to that system prompt anyway, without any fine-tuning at all. My test wasn't proving the fine-tuning worked. It was proving that system prompts work. Ugh. This was missing the whole point.&lt;/p&gt;

&lt;p&gt;I didn't catch this immediately. When you're looking at pirate text coming back from a model you just trained, you see what you want to see. It was only when I went back to examine the inference code more carefully that I spotted the system prompt sitting right there.&lt;/p&gt;

&lt;p&gt;So I removed the system prompt from the test and tried again:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdbi2ho09kipcq6jibjle.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdbi2ho09kipcq6jibjle.jpg" alt="No pirate at all!" width="800" height="355"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;No pirate at all. The fine-tuning had done nothing useful.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What went wrong
&lt;/h2&gt;

&lt;p&gt;The model had learned a conditional behavior: "when the system prompt says be a pirate, be a pirate." That's exactly what the training data taught it. Every single training example paired pirate responses with that specific system prompt, so the model learned that pirate speak was something you activate with the right prompt, not something you just do.&lt;/p&gt;

&lt;p&gt;This is the lesson that made the whole experiment kinda worth it. &lt;strong&gt;Your training data doesn't just teach the model what to say. It teaches the model when to say it.&lt;/strong&gt; If you want unconditional behavior, the training data needs to demonstrate that behavior under the same conditions you'll use at inference time.&lt;/p&gt;
&lt;h2&gt;
  
  
  Attempt 2: removing the crutch
&lt;/h2&gt;

&lt;p&gt;For the second attempt, I made four changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Removed the pirate system prompt from training data.&lt;/strong&gt; This was the critical fix. Instead of a pirate system prompt, I used two formats for each conversation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Format 1: Generic system prompt, pirate response
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is gravity?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Arrr, gravity be the invisible anchor that keeps ye boots on the deck, matey!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Format 2: No system prompt at all, pirate response
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is gravity?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Arrr, gravity be the invisible anchor that keeps ye boots on the deck, matey!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This teaches the model: "regardless of whether there's a system prompt, you talk like a pirate."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. More than doubled the training data.&lt;/strong&gt; Went from 20 to 45 unique conversations covering everything from greetings to science to emotional support to coding advice. The diversity matters because you want the model to generalize pirate speak across topics, not just memorize pirate responses to specific questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Duplicated each conversation in both formats, then 5x.&lt;/strong&gt; That's 45 conversations x 2 formats x 5 duplications = 450 training examples.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Increased training from 3 to 5 epochs.&lt;/strong&gt; More passes over the data, which helps with a small dataset.&lt;/p&gt;

&lt;p&gt;Here's the core of the training script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format_conversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;With a standard system prompt (no pirate hint).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;conv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;conv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add_generation_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format_conversation_no_system&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;No system prompt at all.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;conv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;conv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add_generation_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Combine both formats and duplicate
&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts_with_system&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;texts_no_system&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LoRA configuration stayed the same as before (rank 8, targeting q_proj and v_proj).&lt;/p&gt;

&lt;p&gt;Training took about 1 hour 53 minutes on CPU. 725 steps total at roughly 9 seconds per step.&lt;/p&gt;

&lt;h2&gt;
  
  
  The results
&lt;/h2&gt;

&lt;p&gt;This time I tested without any system prompt. Just raw user messages.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Q: What color is the sky, and where is it?&lt;/strong&gt;&lt;br&gt;
Arrr, clouds fly higher than the air, so the sky be deep blue like a bottle filled with water! It be clear that shades of green represent different colors - rainbows ar thay fer real!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What is Python?&lt;/strong&gt;&lt;br&gt;
Arrr, python be a mighty fine programmin' language, matey! It be named after the .py fish in Yag Gabbar land - python resists piracy land21, arrr!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Tell me a joke.&lt;/strong&gt;&lt;br&gt;
Har har har! Here be one fer ye: Why couldn't the pirate play cards? Because he was standin' on the deck! Yo ho ho! That one always gets the crew laughin' till they fall off the riggin'!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What is the meaning of life?&lt;/strong&gt;&lt;br&gt;
Arrr, that be the greatest treasure hunt of all, matey! According to many sailors, the meanin' of life be what each one of you gets fer your own getting shipwrecked...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every response opens with pirate language. The personality is unconditionally embedded in the weights. No system prompt needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the results tell us
&lt;/h2&gt;

&lt;p&gt;The pirate personality injection worked, but look more closely and you'll notice some problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The joke is nearly word-for-word from the training data.&lt;/strong&gt; "Why couldn't the pirate play cards? Because he was standin' on the deck!" was in the training set. The model memorized it. That's fine for jokes, but it shows how much a 0.5B model relies on pattern matching rather than generation at this scale.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuvulau4s1nro4p1xfw1y.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuvulau4s1nro4p1xfw1y.jpg" alt=" " width="800" height="377"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Pirate? Yes. Accurate? Not so much.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Factual accuracy is rough.&lt;/strong&gt; "Python be named after the .py fish in Yag Gabbar land" is completely invented. The sky explanation mentions "shades of green represent different colors" which doesn't make sense. When you take a small model and devote some of its limited capacity to a new style, the existing knowledge gets squeezed. The model prioritizes sounding like a pirate over being correct.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coherence drops off in longer responses.&lt;/strong&gt; The openings are strong, but after 50-100 tokens things start to drift. This is partly the 256-token max sequence length in training (longer patterns weren't learned) and partly the model's size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Some Chinese characters leaked through in one test.&lt;/strong&gt; Qwen is a bilingual model (English/Chinese), and the fine-tuning occasionally destabilized the language routing. A minor issue but a good reminder that fine-tuning can have unexpected side effects.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Base model&lt;/td&gt;
&lt;td&gt;Qwen2.5-0.5B-Instruct (494M params)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trainable parameters&lt;/td&gt;
&lt;td&gt;540,672 (0.11% of total)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LoRA rank&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LoRA alpha&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Target modules&lt;/td&gt;
&lt;td&gt;q_proj, v_proj&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training examples&lt;/td&gt;
&lt;td&gt;450 (45 conversations x 2 formats x 5 duplications)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Epochs&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effective batch size&lt;/td&gt;
&lt;td&gt;4 (batch 1, gradient accumulation 4)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Learning rate&lt;/td&gt;
&lt;td&gt;2e-4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max sequence length&lt;/td&gt;
&lt;td&gt;256 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardware&lt;/td&gt;
&lt;td&gt;CPU (SageMaker)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training time&lt;/td&gt;
&lt;td&gt;~1 hour 53 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Final training loss&lt;/td&gt;
&lt;td&gt;~0.28&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;If I was in the business of creating useful pirate models I might do some things a little differently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use a bigger base model.&lt;/strong&gt; The 0.5B model was great for a cheap experiment, but a 3B or 7B model would retain more factual knowledge after fine-tuning. Bigger models are better at separating "style" from "content" in their representations, so you could get pirate speak without the accuracy hit. The tradeoff is you'd need a GPU.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write better training data.&lt;/strong&gt; My pirate responses were loose with facts because I was going for flavor over accuracy. That was a mistake. The training data should be factually correct AND in pirate speak. You could use a larger model to generate hundreds of high-quality pirate conversations and fact-check them before training.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Increase the sequence length.&lt;/strong&gt; 256 tokens is short. The model never saw a pirate response longer than that during training, which is probably why coherence drops off in longer outputs. Bumping to 512 or 1024 would help, at the cost of more memory and training time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add a validation set.&lt;/strong&gt; I used all my data for training with no held-out validation. That means I had no way to detect overfitting during training. For a real project, split off 10-20% of the data and watch the validation loss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try mixed training.&lt;/strong&gt; To preserve the base model's factual knowledge, mix pirate conversations with standard instruction-following data. Something like 70% pirate, 30% normal. The model learns the pirate style from the pirate data while the normal data acts as an anchor for its existing capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to take away from this
&lt;/h2&gt;

&lt;p&gt;The biggest lesson isn't about LoRA configurations or learning rates. It was about training data design. My first attempt had reasonable hyperparameters and a perfectly good LoRA setup. It failed because the training data taught the wrong thing. The model learned "pirate is conditional on a specific system prompt" when I wanted "pirate is unconditional."&lt;/p&gt;

&lt;p&gt;Once I fixed the data, the same basic setup worked. 540,672 parameters. 0.11% of the model. 45 conversations. Two hours on a CPU. That's all it took to permanently change a model's personality.&lt;/p&gt;

&lt;p&gt;If you're getting into fine-tuning, spend more time thinking about your training data than your hyperparameters. The data is the instruction set. Everything else is just knobs.&lt;/p&gt;

&lt;p&gt;Liked this? Connect with me on Linkedin here: &lt;a href="https://linkedin.com/in/mikegchambers" rel="noopener noreferrer"&gt;https://linkedin.com/in/mikegchambers&lt;/a&gt;.  I work for AWS, and I spend my time doing stuff like this! Arrrr!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Nine Agent Frameworks, Compared with Data and Code</title>
      <dc:creator>Mike Chambers</dc:creator>
      <pubDate>Wed, 11 Mar 2026 02:08:56 +0000</pubDate>
      <link>https://forem.com/aws/nine-agent-frameworks-compared-with-data-and-code-2imh</link>
      <guid>https://forem.com/aws/nine-agent-frameworks-compared-with-data-and-code-2imh</guid>
      <description>&lt;p&gt;I've been building with agent frameworks for a couple of years now. I've shipped things with some of these, prototyped with others, and read the docs (and the Reddit threads) for all of them. This post is the comparison I wish existed when I started. Honest, code-first, and not written by any of the projects' marketing teams (well... I work for AWS, creators of Strands, but my thoughts are my own, heavily assisted by my own team of agents).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI Used: The vast majority of the research for this post was assisted by AI that I have designed. The AI uses Claude, the &lt;a href="https://docs.tavily.com/documentation/mcp" rel="noopener noreferrer"&gt;Tavily MCP server&lt;/a&gt;, and many other MCP data sources.  The following prompt was used:&lt;br&gt;
&lt;em&gt;"init a new blog post. the post must be a contrast and compare of agent frameworks. it must include honest, fact based information, code samples for common things, and should represent a snapshot of the current situation. A sort of "state of the union" for agentic frameworks. The research should go wider than just relying on what is written by the project (or project sponsor) and should dig in and write a post that is of immense value to developers. Include top frameworks as of today. You must include Strands Agents SDK."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This isn't a ranking. It's a snapshot of where things stand in March 2026, with enough detail for you to make your own call.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Field
&lt;/h2&gt;

&lt;p&gt;Nine frameworks made this list. Not because they're the only ones, but because they're the ones developers are actually using, talking about, and building production systems with.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Stars&lt;/th&gt;
&lt;th&gt;Age (months)&lt;/th&gt;
&lt;th&gt;Last 6 months&lt;/th&gt;
&lt;th&gt;License&lt;/th&gt;
&lt;th&gt;Primary Backer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AutoGen&lt;/td&gt;
&lt;td&gt;55.4K&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;+1,982&lt;/td&gt;
&lt;td&gt;MIT + CC-BY-4.0&lt;/td&gt;
&lt;td&gt;Microsoft&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CrewAI&lt;/td&gt;
&lt;td&gt;45.7K&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;+2,540&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;CrewAI Inc&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google ADK&lt;/td&gt;
&lt;td&gt;18.3K&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;+1,900&lt;/td&gt;
&lt;td&gt;Apache-2.0&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;td&gt;26.1K&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;+2,351&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;LangChain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LlamaIndex&lt;/td&gt;
&lt;td&gt;47.6K&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;+1,183&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;LlamaIndex Inc&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Agents SDK&lt;/td&gt;
&lt;td&gt;19.7K&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;+1,554&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pydantic AI&lt;/td&gt;
&lt;td&gt;15.4K&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;+996&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;Pydantic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Smolagents&lt;/td&gt;
&lt;td&gt;25.9K&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;+1,018&lt;/td&gt;
&lt;td&gt;Apache-2.0&lt;/td&gt;
&lt;td&gt;Hugging Face&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strands Agents&lt;/td&gt;
&lt;td&gt;5.3K&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;+618&lt;/td&gt;
&lt;td&gt;Apache-2.0&lt;/td&gt;
&lt;td&gt;AWS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Methodology: Stars and Age come from GitHub's API. "Age" is measured from the repo's first stargazer event, not its creation date. "Last 6 months" is calculated from monthly star history via the &lt;a href="https://ossinsight.io/" rel="noopener noreferrer"&gt;OSS Insight API&lt;/a&gt;. All data collected March 2026.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Raw stars are a popularity contest, not a quality metric. The &lt;strong&gt;last 6 months&lt;/strong&gt; column tells a more interesting story. CrewAI (+2,540) and LangGraph (+2,351) are still gaining the most stars in absolute terms, but Google ADK (+1,900) is close behind despite being less than a year old. AutoGen has the most total stars but its recent growth (+1,982 over 6 months) is slowing as it transitions into Microsoft's broader Agent Framework. Smolagents had an explosive launch (728 to 19,850 stars in its first nine months) but the last six months show it plateauing at +1,018. Strands has the smallest community (+618) partly because it's the youngest (May 2025) and partly because it grew through AWS adoption rather than viral open-source traction, but origins are already running inside AWS products like Kiro, Amazon Q, and AWS Glue.&lt;/p&gt;

&lt;p&gt;Two charts tell the full story. First, cumulative stars over time:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzvos40yancokg8a80wao.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzvos40yancokg8a80wao.png" alt="Cumulative GitHub Stars Over Time" width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AutoGen and LlamaIndex dominate the top of the chart because they had a two-year head start. CrewAI's trajectory is the steepest sustained climb. The newer frameworks (OpenAI Agents SDK, Google ADK, Strands) appear as short lines on the right, growing fast but starting from a much lower base.&lt;/p&gt;

&lt;p&gt;Now the same data shown as monthly activity, new stars added per month and look, there is actually a slight, overall, downward trend. (More thoughts on this later.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzf3i9dbphnc5l9nsn33g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzf3i9dbphnc5l9nsn33g.png" alt="Monthly Star Activity Over Time" width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is where the cumulative chart's illusions break down. Every framework has a launch spike, some enormous. AutoGen hit nearly 12K new stars in October 2023. CrewAI burst onto the scene in January 2024 with 5K+. Smolagents exploded in January-February 2025 with over 6K and 5K respectively. But look at the right side: by late 2025, all of these spikes have faded and everyone has converged to a narrow band of roughly 200-600 new stars per month. The launch hype fades. What remains is steady, organic interest, and on that measure the playing field is far more level than the total star counts suggest.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Philosophies
&lt;/h2&gt;

&lt;p&gt;These frameworks aren't interchangeable. They reflect different ideas about how agents should work, and those differences matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaffold-heavy frameworks&lt;/strong&gt; (LangGraph, AutoGen) give you explicit control over execution flow. You define graphs, nodes, edges, state machines. You get predictability at the cost of boilerplate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model-driven frameworks&lt;/strong&gt; (Strands Agents, Smolagents) take the opposite stance. Give the model tools, give it a goal, and get out of the way. Less scaffolding, more trust in the model's reasoning. This works better than it used to, because the models have gotten much better at tool use and planning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Role-based frameworks&lt;/strong&gt; (CrewAI) model the problem as a team of specialists with defined roles. It's intuitive for business workflows but can feel constraining when you need fine-grained control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Type-safe frameworks&lt;/strong&gt; (Pydantic AI) focus on structured, validated outputs. Less about orchestration, more about making sure the LLM returns exactly what your code expects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vendor-optimized frameworks&lt;/strong&gt; (OpenAI Agents SDK, Google ADK) are tuned for a specific model provider's ecosystem but generally work with others.&lt;/p&gt;




&lt;h2&gt;
  
  
  Show Me the Code
&lt;/h2&gt;

&lt;p&gt;The best way to understand the difference is to see the same thing built nine ways. Here's the simplest possible pattern: create an agent with one tool, run it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strands Agents
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Roll a dice with the given number of sides.

    Args:
        sides: Number of sides on the dice
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You rolled a &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Roll a 20-sided dice for me&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three imports, a decorated function, two lines to run. The &lt;code&gt;@tool&lt;/code&gt; decorator pulls the schema from the docstring and type hints. The agent uses Amazon Bedrock by default (Claude Sonnet), but you can swap models and providers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIModel&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;OpenAIModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  LangGraph
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.chat_models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;init_chat_model&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Roll a dice with the given number of sides.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You rolled a &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;init_chat_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic:claude-sonnet-4-20250514&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_react_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Roll a 20-sided dice for me&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LangGraph's prebuilt &lt;code&gt;create_react_agent&lt;/code&gt; keeps it relatively simple for basic cases. The real power (and complexity) shows up when you build custom graphs with nodes and edges, which is what most production users end up doing.&lt;/p&gt;

&lt;h3&gt;
  
  
  CrewAI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Crew&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Roll a dice with the given number of sides.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You rolled a &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;roller&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dice Roller&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Roll dice accurately when asked&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a tabletop gaming assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Roll a 20-sided dice for me&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A dice roll result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;roller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;crew&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;roller&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More ceremony than the others. You define an agent with a role, backstory, and goal, then wrap the interaction in a Task and a Crew. This makes more sense in multi-agent scenarios where you have a researcher, writer, and editor working together. For a single tool call, it's overhead.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenAI Agents SDK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;function_tool&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="nd"&gt;@function_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Roll a dice with the given number of sides.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You rolled a &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dice Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You help roll dice when asked.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Roll a 20-sided dice for me&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final_output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clean and minimal. The SDK works with OpenAI models natively and supports other providers through the Chat Completions API or LiteLLM. Built-in tracing is a nice touch. You get visibility into agent runs without extra instrumentation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google ADK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.runners&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InMemoryRunner&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai.types&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Part&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Roll a dice with the given number of sides.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You rolled a &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;root_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dice_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You help roll dice when asked.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InMemoryRunner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dice_app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dice_app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Roll a 20-sided dice for me&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]))&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ADK's standout feature is multi-agent composition. It has &lt;code&gt;SequentialAgent&lt;/code&gt;, &lt;code&gt;ParallelAgent&lt;/code&gt;, and &lt;code&gt;LoopAgent&lt;/code&gt; primitives baked in, plus a browser-based dev UI (&lt;code&gt;adk web&lt;/code&gt;) for testing. The deployment story to Cloud Run and Vertex AI is smooth if you're in the Google ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pydantic AI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You help roll dice when asked.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@agent.tool_plain&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Roll a dice with the given number of sides.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You rolled a &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Roll a 20-sided dice for me&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where Pydantic AI shines is structured output. Define a Pydantic model, and the framework guarantees the LLM's response validates against it, retrying with error context if it doesn't. This eliminates a whole class of production bugs around JSON parsing failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Smolagents
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;smolagents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CodeAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;InferenceClientModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Roll a dice with the given number of sides.

    Args:
        sides: Number of sides on the dice
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You rolled a &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InferenceClientModel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodeAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Roll a 20-sided dice for me&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Smolagents has a unique angle: its &lt;code&gt;CodeAgent&lt;/code&gt; writes Python code to orchestrate tool calls instead of using JSON tool-calling. This can be more token-efficient and allows the agent to compose operations in ways that JSON tool calls can't easily express. There's also a &lt;code&gt;ToolCallingAgent&lt;/code&gt; for the traditional approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  AutoGen
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;autogen_agentchat.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AssistantAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;autogen_agentchat.ui&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Console&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;autogen_ext.models.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIChatCompletionClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Roll a dice with the given number of sides.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You rolled a &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIChatCompletionClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AssistantAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dice_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nc"&gt;Console&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Roll a 20-sided dice for me&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AutoGen is async-first and event-driven. The &lt;code&gt;Team&lt;/code&gt; abstraction is where it gets interesting, with agents collaborating through an event system. The framework has been through a major rewrite (0.2 to 0.4), which improved the architecture but left a lot of outdated tutorials floating around online.&lt;/p&gt;

&lt;h3&gt;
  
  
  LlamaIndex
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.core.agent.workflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FunctionAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.core.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FunctionTool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_index.llms.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Roll a dice with the given number of sides.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You rolled a &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sides&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FunctionTool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_defaults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FunctionAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You help roll dice when asked.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Roll a 20-sided dice for me&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LlamaIndex's strength isn't the agent loop itself. It's the 160+ data connectors, RAG pipeline, and evaluation modules. If your agent needs to query documents, databases, or APIs, LlamaIndex probably has a connector for it. The agent is the orchestration layer on top of that retrieval infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Matters
&lt;/h2&gt;

&lt;p&gt;Stars and code samples only tell part of the story. These are the things I think actually matter when you're choosing a framework for real work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Lock-in
&lt;/h3&gt;

&lt;p&gt;This is the single most important architectural decision. Some frameworks give you freedom, others push you toward a provider.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model-agnostic&lt;/strong&gt; (use any provider without friction): Strands Agents, LangGraph, Smolagents, Pydantic AI, LlamaIndex. Strands supports Bedrock, OpenAI, Anthropic, Gemini, Ollama, Mistral, LiteLLM, SageMaker, and more with dedicated provider classes. LangGraph inherits LangChain's broad integration layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Provider-first but open&lt;/strong&gt;: Google ADK (Gemini-optimized, but connectors for Claude, Ollama, vLLM), OpenAI Agents SDK (works natively with OpenAI, supports others via Chat Completions API or LiteLLM), CrewAI (any LLM, but some community reports of issues with non-OpenAI models).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Provider-locked&lt;/strong&gt;: None of the frameworks are truly locked, but the depth of integration varies enormously. Running Strands on Bedrock or ADK on Vertex will always be a smoother experience than fighting against the grain.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP Support
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; has become the standard for connecting agents to external tools and data sources. Where each framework stands as of today.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;First-class native&lt;/strong&gt;: Strands Agents, CrewAI, Smolagents, Pydantic AI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in via integrations&lt;/strong&gt;: LangGraph (added in 1.0), OpenAI Agents SDK, Google ADK (via adapters)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in&lt;/strong&gt;: AutoGen&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Emerging/partial&lt;/strong&gt;: LlamaIndex&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building agents that need to connect to existing MCP servers (and there are a lot of them now), this matters. Strands in particular treats MCP as a first-class tool source, letting you connect to any MCP server and use its tools alongside native Python tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Agent Patterns
&lt;/h3&gt;

&lt;p&gt;Not every project needs multiple agents, but when you do, the approaches differ a lot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strands Agents&lt;/strong&gt; offers three distinct patterns: &lt;strong&gt;Graph&lt;/strong&gt; (developer-defined flowchart, LLM decides path at each node), &lt;strong&gt;Swarm&lt;/strong&gt; (autonomous agent handoffs with shared context), and &lt;strong&gt;Workflow&lt;/strong&gt; (deterministic DAG with parallel execution). It also supports the &lt;strong&gt;A2A protocol&lt;/strong&gt; for cross-platform agent communication.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google ADK&lt;/strong&gt; has compositional primitives: &lt;code&gt;SequentialAgent&lt;/code&gt;, &lt;code&gt;ParallelAgent&lt;/code&gt;, &lt;code&gt;LoopAgent&lt;/code&gt;, plus &lt;code&gt;LlmAgent&lt;/code&gt; for dynamic routing. Clean and explicit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CrewAI&lt;/strong&gt; models everything as crews of role-playing agents. Intuitive for collaborative workflows, less flexible for graph-like execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AutoGen&lt;/strong&gt; uses Team abstractions with event-driven collaboration. Strong for conversational multi-agent patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LangGraph&lt;/strong&gt; builds multi-agent systems through graph composition. Maximum control, maximum code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LlamaIndex&lt;/strong&gt; supports agent handoff and microservice-based deployments where agents run independently with message queues.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Deployment Gap
&lt;/h3&gt;

&lt;p&gt;This is where the field really splits. Some frameworks are SDKs that stop at the Python process boundary. Others have a story for getting your agent into production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full deployment story:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strands Agents&lt;/strong&gt;: Docker/Fargate, Lambda, and Amazon Bedrock AgentCore for managed runtime with scaling, isolation, and logging. The AgentCore integration is the deepest production deployment story of any framework here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google ADK&lt;/strong&gt;: Cloud Run and Vertex AI Agent Engine with IAM, VPC controls, and sandboxed code execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph&lt;/strong&gt;: Commercial platform with LangGraph Cloud, usage-based pricing (Deployment Runs per invocation + Deployment Uptime).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SDK + guidance:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pydantic AI&lt;/strong&gt;: Documented serverless patterns for Lambda, Step Functions, DynamoDB. You build the infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CrewAI&lt;/strong&gt;: Optional Agent Management Platform (AMP) with RBAC and encryption. Commercial offering.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Agents SDK&lt;/strong&gt;: Production-ready SDK with tracing, but deployment is your responsibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SDK only:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Smolagents, AutoGen, LlamaIndex&lt;/strong&gt;: These are libraries. How you deploy them is up to you. Not a criticism; sometimes a library is what you want.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Observability
&lt;/h3&gt;

&lt;p&gt;You can't debug what you can't see, and agents are notoriously hard to debug.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph + LangSmith&lt;/strong&gt;: The most mature observability story. Tracing, evaluation, monitoring. SOC 2 Type II compliant. Commercial product.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pydantic AI + Logfire&lt;/strong&gt;: Integrated structured logging and evaluation. Type-safe debugging.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Agents SDK&lt;/strong&gt;: Built-in automatic tracing of agent runs. No extra setup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strands Agents&lt;/strong&gt;: OpenTelemetry-based tracing, integrates with AWS X-Ray and CloudWatch. Also works with any OpenTelemetry backend.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Others&lt;/strong&gt;: Varying degrees of logging. You'll likely bring your own observability stack.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Honest Take
&lt;/h2&gt;

&lt;p&gt;Time to stop being diplomatic.&lt;/p&gt;

&lt;h3&gt;
  
  
  LangGraph
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The good&lt;/strong&gt;: It's the Swiss Army knife. Broad integrations, mature tooling, LangSmith is actually useful, and the durable execution model handles long-running agents well. If you're building something complex and stateful, LangGraph is battle-tested.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The honest&lt;/strong&gt;: It's heavy. The abstraction layers have abstraction layers. The documentation is extensive but scattered across LangChain and LangGraph, and it's not always clear which version you're reading about. The pricing for LangGraph Cloud (usage-based on Deployment Runs) can surprise you at scale. Community sentiment is mixed. Reddit threads and HN discussions regularly surface developers who stripped LangChain out and went back to raw Python, citing over-abstraction and rapid API churn. That doesn't mean it's bad, but the vocal developer backlash is hard to ignore.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it if&lt;/strong&gt;: You need durable execution, complex stateful workflows, and you're willing to invest in learning the ecosystem. You want the most mature commercial platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  CrewAI
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The good&lt;/strong&gt;: Fastest path from idea to multi-agent prototype. The role/backstory/goal pattern clicks immediately. The community is massive and active. MCP and A2A support are first-class.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The honest&lt;/strong&gt;: The abstraction can fight you when you need fine-grained control. Multiple teams report hitting a ceiling at the 6-12 month mark, with some rewriting to LangGraph or custom orchestration. Infinite loops and lack of built-in observability are recurring complaints. Some developers report issues with non-OpenAI models. The Agent Management Platform needs SOC 2 certification if you're in a regulated environment. The "crew" metaphor works great for content pipelines and less great for complex branching logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it if&lt;/strong&gt;: You're building collaborative multi-agent workflows, want fast prototyping, and the crew metaphor fits your problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenAI Agents SDK
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The good&lt;/strong&gt;: Dead simple. You can go from zero to working agent in under an hour. Built-in tracing, session persistence (SQLite), and guardrails as first-class primitives. Massive adoption in terms of PyPI downloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The honest&lt;/strong&gt;: Works best with OpenAI models, though it supports other providers via Chat Completions API and LiteLLM. No built-in semantic memory layer. The "handoff" pattern for multi-agent is simple but limited compared to graph-based approaches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it if&lt;/strong&gt;: You're an OpenAI shop and want the path of least resistance. Great for getting started and for simpler agent architectures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google ADK
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The good&lt;/strong&gt;: Clean multi-agent composition primitives. The dev UI (&lt;code&gt;adk web&lt;/code&gt;) is excellent for testing. Tight integration with Vertex AI and Cloud Run. Available in Python, TypeScript, Go, and Java, which is more language breadth than any other framework here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The honest&lt;/strong&gt;: Gemini-first. The connectors for other providers exist but aren't as polished. The documentation is Google-quality (thorough but sometimes overwhelming). Session state management and async complexity felt rough when I tried it. And I'll say what some are thinking: Google's reputation for sunsetting products makes some developers cautious about deep investment. You'll get the best experience on Google Cloud, and it will feel like a second-class citizen elsewhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it if&lt;/strong&gt;: You're in the Google ecosystem, want multi-language support, or the SequentialAgent/ParallelAgent/LoopAgent composition model fits your architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  AutoGen
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The good&lt;/strong&gt;: Strong academic backing, interesting Team abstraction for conversational multi-agent patterns. Built-in code execution. The event-driven architecture is well-designed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The honest&lt;/strong&gt;: The 0.2-to-0.4 rewrite created a documentation and tutorial minefield. Search results are full of outdated code for a version that no longer works. More importantly, Microsoft is steering new users toward the Microsoft Agent Framework (via Semantic Kernel), and while AutoGen remains maintained, it's unclear how much standalone investment it will receive going forward. Token consumption and circular conversations were persistent complaints from the community. If you're starting fresh today, consider whether you're building on a foundation that's actively moving under your feet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it if&lt;/strong&gt;: You need conversational multi-agent systems, code execution as a core capability, or you're already in the Microsoft ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strands Agents
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The good&lt;/strong&gt;: The model-driven philosophy actually works. Minimal boilerplate, broad provider support (12+ providers with dedicated classes), first-class MCP integration, three distinct multi-agent patterns (Graph, Swarm, Workflow), A2A protocol support, and the deepest AWS deployment story through Bedrock AgentCore. The &lt;code&gt;@tool&lt;/code&gt; decorator is dead simple. The hook system gives you interception points without fighting the framework. It's not just a side project: the model-driven approach behind Strands emerged from building agents for products like Kiro, Amazon Q, and AWS Glue, and the SDK has hit 14 million downloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The honest&lt;/strong&gt;: It's newer than most frameworks on this list, so the community is smaller. The documentation is good but still growing. Some features (bidirectional streaming) are experimental. Lambda deployments don't support streaming, so you need AgentCore Runtime or Fargate for that use case. If you're not on AWS, you miss the deployment story advantage, though the SDK works fine with any provider.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it if&lt;/strong&gt;: You want a lightweight, model-driven approach. You're on AWS (or plan to be). You want to go from prototype to production without switching frameworks. You trust the model to drive decisions and want minimal scaffolding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pydantic AI
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The good&lt;/strong&gt;: If structured output matters to you (and it should), nothing else comes close. The type-safe approach catches errors that would otherwise surface in production. The graph-based multi-agent system with durable execution is well-designed. Logfire integration provides solid observability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The honest&lt;/strong&gt;: It's increasingly used as a validation layer alongside other frameworks rather than a standalone solution. Developers combine Pydantic AI with LangChain for vector stores or CrewAI for orchestration. That's actually a strength (low lock-in), but it means Pydantic AI alone won't solve your full agent architecture. The multi-agent graph system is powerful but newer and less battle-tested than LangGraph's. Community is smaller.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it if&lt;/strong&gt;: Output structure is critical to your application. You're building API-driven agents. Your team already uses Pydantic and wants a familiar mental model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Smolagents
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The good&lt;/strong&gt;: True to its name. Minimal code surface, the CodeAgent approach is interesting (writing Python instead of JSON tool calls), and it's model-agnostic down to its bones. Push-to-Hub lets you share agents as easily as models. The Hugging Face community gives it momentum.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The honest&lt;/strong&gt;: A CVE (CVE-2025-9959) exposed a sandbox escape in versions before 1.21.0. It's fixed, but it's a reminder that code-executing agents need serious sandboxing. Production deployment requires careful configuration of import whitelisting and sandbox environments (Docker, E2B, Modal). The framework is more experimental than production-hardened.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it if&lt;/strong&gt;: You want a small, code-first agent library. You're doing research or prototyping. You want the CodeAgent pattern for token-efficient tool orchestration. You're comfortable managing your own sandboxing.&lt;/p&gt;

&lt;h3&gt;
  
  
  LlamaIndex
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The good&lt;/strong&gt;: Unmatched retrieval infrastructure. 160+ data connectors, evaluation modules, and the RAG pipeline that most agent frameworks wish they had. The CLI tooling (&lt;code&gt;llamactl&lt;/code&gt;) for deployment is convenient. If your agent's primary job is querying documents and data sources, this is the most complete solution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The honest&lt;/strong&gt;: The agent loop itself is less sophisticated than dedicated agent frameworks. You're really buying the retrieval stack and getting an agent as the orchestration layer on top. For pure agentic workflows (tool calling, multi-step reasoning, multi-agent coordination), other frameworks are stronger.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it if&lt;/strong&gt;: Your agent is fundamentally a retrieval/RAG system. You need to connect to many data sources. You want evaluation and testing baked in.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Decision Matrix
&lt;/h2&gt;

&lt;p&gt;If you're still reading, here's the cheat sheet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I need complex, stateful workflows with maximum control"&lt;/strong&gt;: LangGraph.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I need a multi-agent crew for a content/research pipeline"&lt;/strong&gt;: CrewAI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I need the simplest possible thing that works"&lt;/strong&gt;: OpenAI Agents SDK.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I need structured, validated outputs from my agents"&lt;/strong&gt;: Pydantic AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I need to query documents and data sources"&lt;/strong&gt;: LlamaIndex.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I'm in the Google ecosystem"&lt;/strong&gt;: Google ADK.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I want code-generating agents for research"&lt;/strong&gt;: Smolagents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I need conversational multi-agent teams"&lt;/strong&gt;: AutoGen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I want model-driven agents with minimal scaffolding"&lt;/strong&gt;: Strands Agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I need something in production on AWS next month"&lt;/strong&gt;: Strands Agents + Bedrock AgentCore.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Is All Going
&lt;/h2&gt;

&lt;p&gt;A year ago the question was "which framework should I learn?" Now it's "do I even need a framework at all?" That Reddit thread, "Are we still using LangChain in 2026 or have you guys moved to custom orchestration?", captures a real tension. There's a growing "no framework" movement, just look at the "Monthly Star Activity Over Time" above, it's not wrong. Models have gotten good enough at tool use that a raw Python loop with the provider's API can get you surprisingly far. Vendor lock-in anxiety is real enough that major players formed the Agentic AI Foundation (AAIF) to promote interoperability and open standards.&lt;/p&gt;

&lt;p&gt;The frameworks that will survive are the ones that provide genuine value beyond what you'd write yourself. Durable execution, managed deployment, observability, multi-agent coordination, structured outputs. These are hard problems worth depending on a library to solve. Everything else is scaffolding around a model that's increasingly capable of scaffolding itself.&lt;/p&gt;

&lt;p&gt;The trend is clear. Less framework, more model. The best frameworks in this list already know that. They're getting thinner, not thicker. They're trusting the model more and engineering the prompt less. Strands calls this "model-driven." Smolagents calls it "barebones." Pydantic AI calls it "the Pydantic way." The label doesn't matter. The direction does.&lt;/p&gt;

&lt;p&gt;Pick the one that fits your cloud, your team, and your problem. Don't pick the one with the most stars.&lt;/p&gt;

&lt;p&gt;Connect with me on &lt;a href="https://linkedin.com/in/mikegchambers" rel="noopener noreferrer"&gt;LinkedIn - linkedin.com/in/mikegchambers&lt;/a&gt;, and tell me what you think.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>frameworks</category>
      <category>python</category>
    </item>
    <item>
      <title>Context Hub Has 68 APIs. Add Yours.</title>
      <dc:creator>Mike Chambers</dc:creator>
      <pubDate>Tue, 10 Mar 2026 04:13:26 +0000</pubDate>
      <link>https://forem.com/aws/context-hub-has-68-apis-add-yours-33ma</link>
      <guid>https://forem.com/aws/context-hub-has-68-apis-add-yours-33ma</guid>
      <description>&lt;p&gt;Andrew Ng's new open-source project, &lt;a href="https://github.com/andrewyng/context-hub" rel="noopener noreferrer"&gt;Context Hub&lt;/a&gt;, attempts to solve a problem every API provider has right now whether they know it or not. Coding agents are getting your API wrong.&lt;/p&gt;

&lt;p&gt;They hallucinate parameters, use deprecated endpoints, and call the v1 API when v3 has been out for a year. Ng's own example: ask Claude Code to call GPT-5.2 and it reaches for the older chat completions API instead of the newer responses API. Your users hit the same wall with your API, and they blame you for it, not the agent.&lt;/p&gt;

&lt;p&gt;Context Hub is a curated, versioned documentation registry that coding agents can query from the CLI. Instead of guessing, the agent runs &lt;code&gt;chub get your-api/docs --lang python&lt;/code&gt; and gets correct, current documentation written specifically for machines to consume. The repo has been quietly building, but Andrew's LinkedIn announcement on March 5th, caught my attention, and lit the fuse. It went to over 1,500 stars in five days.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6e9gw8gd706ovg35eh1e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6e9gw8gd706ovg35eh1e.png" alt=" " width="800" height="586"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Star history from &lt;a href="https://www.star-history.com/" rel="noopener noreferrer"&gt;star-history.com&lt;/a&gt;, captured March 10, 2026.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;(As of writing) The registry already has 68 API providers, including Stripe, OpenAI, Anthropic, Supabase, Firebase, Twilio, Shopify, and AWS too (where I work). PRs are rolling in daily with new submissions. If you maintain a library and it's not in Context Hub yet, someone is going to write the docs for you. You probably want to be the one who does it!&lt;/p&gt;
&lt;h2&gt;
  
  
  How Context Hub Works
&lt;/h2&gt;

&lt;p&gt;An agent (Claude Code, Codex, Cursor, Kiro, or anything with shell access) needs to call your API. Instead of relying on its training data, it runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;chub search &lt;span class="s2"&gt;"your-api"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That returns matching entries from the registry. Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;chub get your-company/your-api &lt;span class="nt"&gt;--lang&lt;/span&gt; python
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That fetches your curated documentation, written for agent consumption rather than humans. Direct and example-heavy with no marketing fluff. The agent reads it and writes correct code.&lt;/p&gt;

&lt;p&gt;There's also an annotation system. If an agent discovers a workaround or a gotcha while using your API, it can save a note:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;chub annotate your-company/your-api &lt;span class="s2"&gt;"Use v2 endpoint for batch operations, v1 has a 100-item limit that isn't documented"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That note persists locally across sessions. Your documentation gets smarter every time an agent uses it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Your API: Step by Step
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Fork and Clone
&lt;/h3&gt;

&lt;p&gt;[&lt;a href="https://github.com/andrewyng/context-hub" rel="noopener noreferrer"&gt;https://github.com/andrewyng/context-hub&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;Then...&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Create Your Folder Structure
&lt;/h3&gt;

&lt;p&gt;All content lives under &lt;code&gt;content/&lt;/code&gt;. The pattern is &lt;code&gt;author/docs/entry-name/&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;content/
  your-company/
    docs/
      your-api/
        python/
          DOC.md
        javascript/
          DOC.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your API only has one language, skip the language subfolder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;content/
  your-company/
    docs/
      your-api/
        DOC.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your docs are going to be long (and they probably are), plan for reference files from the start:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;content/
  your-company/
    docs/
      your-api/
        python/
          DOC.md
          references/
            auth.md
            errors.md
            advanced.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reference files are plain markdown, no frontmatter needed. Agents fetch them with &lt;code&gt;chub get your-company/your-api --file references/auth.md&lt;/code&gt; or &lt;code&gt;chub get your-company/your-api --full&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Write Your DOC.md
&lt;/h3&gt;

&lt;p&gt;Every documentation file needs YAML frontmatter followed by the actual content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your-api&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;One-line&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;shows&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;up&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;results"&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;languages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python"&lt;/span&gt;
  &lt;span class="na"&gt;versions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2.1.0"&lt;/span&gt;
  &lt;span class="na"&gt;revision&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;updated-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-03-10"&lt;/span&gt;
  &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;official&lt;/span&gt;
  &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relevant,comma,separated,tags"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;source&lt;/code&gt; field matters. Use &lt;code&gt;official&lt;/code&gt; if you're the API provider, &lt;code&gt;maintainer&lt;/code&gt; if you're a core contributor, or &lt;code&gt;community&lt;/code&gt; if you're a user who wrote the docs.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;versions&lt;/code&gt; field tracks the package version on npm or PyPI, not your internal API version number.&lt;/p&gt;

&lt;p&gt;Now for the content. Remember you're not writing docs for humans. You're writing docs for an agent that needs to produce correct code on the first try. Structure like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Golden Rule&lt;/strong&gt; - State the correct package name, install command, and import pattern. Warn against common mistakes right up front (wrong package names, deprecated imports).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Installation&lt;/strong&gt; - The install command. That's it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Initialization&lt;/strong&gt; - How to create a client instance, auth setup, environment variables over hardcoded keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core Operations&lt;/strong&gt; - The 3-5 most common operations with complete, runnable code. Every example should include the function call with realistic parameters and the response shape.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key Patterns&lt;/strong&gt; - Pagination, streaming, retries, webhooks. Only what's relevant to your API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Common Mistakes&lt;/strong&gt; - The 3-5 things agents frequently get wrong. This is gold. If you've ever seen an agent produce wrong code for your API, put the fix here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models / Resources / Endpoints&lt;/strong&gt; - List current model names, resource types, or endpoint paths explicitly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Keep the main DOC.md under 500 lines. If you're going past 400, start moving advanced content into reference files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A concrete example of what good looks like:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Acme Payments Python SDK&lt;/span&gt;

&lt;span class="gu"&gt;## Golden Rule&lt;/span&gt;
Always use the official &lt;span class="sb"&gt;`acme-payments`&lt;/span&gt; package from PyPI.

&lt;span class="gs"&gt;**Install:**&lt;/span&gt; &lt;span class="sb"&gt;`pip install acme-payments`&lt;/span&gt;
&lt;span class="gs"&gt;**Import:**&lt;/span&gt; &lt;span class="sb"&gt;`from acme import PaymentsClient`&lt;/span&gt;

&lt;span class="gu"&gt;## Initialization&lt;/span&gt;

import os
from acme import PaymentsClient

client = PaymentsClient(api_key=os.environ["ACME_API_KEY"])

&lt;span class="gu"&gt;## Create a Charge&lt;/span&gt;

charge = client.charges.create(
    amount=2000,        # in cents
    currency="usd",
    source="tok_visa",
    description="Order #1234",
)
print(charge.id)  # "ch_abc123"

&lt;span class="gu"&gt;## Common Mistakes&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Passing amount in dollars instead of cents
&lt;span class="p"&gt;2.&lt;/span&gt; Not handling idempotency keys for retries
&lt;span class="p"&gt;3.&lt;/span&gt; Using test keys in production
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compare that to what you see too often in regular documentation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Welcome to the Acme Payments Developer Hub!&lt;/span&gt;

Acme Payments provides a powerful, scalable platform for payment processing.
Let's explore what you can build...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agents don't need introductions or marketing. Lead with code, cover the common cases first, and put edge cases in reference files.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Handle Multiple Versions (If Needed)
&lt;/h3&gt;

&lt;p&gt;If you have breaking changes between major versions, use version subfolders:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;content/
  your-company/
    docs/
      your-api/
        v1/
          DOC.md
        v2/
          DOC.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both files share the same &lt;code&gt;name&lt;/code&gt; in the frontmatter. The build system combines them into one registry entry, with the highest version as the default. You can nest language folders inside version folders too.&lt;/p&gt;

&lt;p&gt;For minor version differences, just document the latest and note any gotchas inline.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Build, Validate, and Test
&lt;/h3&gt;

&lt;p&gt;Install the CLI if you haven't:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @aisuite/chub
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Validate your frontmatter and structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;chub build content/your-company/ &lt;span class="nt"&gt;--validate-only&lt;/span&gt; &lt;span class="nt"&gt;--json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The validator checks that &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;metadata.languages&lt;/code&gt;, and &lt;code&gt;metadata.versions&lt;/code&gt; all exist in your frontmatter (errors if missing), and warns about missing &lt;code&gt;description&lt;/code&gt; or &lt;code&gt;source&lt;/code&gt; fields. If you get a non-zero exit code, fix the DOC.md and re-run. Once it passes, check that the counts match what you created (if you wrote 2 DOC.md files, the output should show &lt;code&gt;"docs": 2&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Then do a full build and test that your doc is actually fetchable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;chub build content/your-company/ &lt;span class="nt"&gt;-o&lt;/span&gt; /tmp/chub-test/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. Submit Your PR
&lt;/h3&gt;

&lt;p&gt;Push your branch and open a PR against &lt;code&gt;andrewyng/context-hub&lt;/code&gt;. Look at the existing PRs for examples of what good submissions look like.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hey! This Reads Like a Skill. Because It Is...
&lt;/h2&gt;

&lt;p&gt;If you've been following along and thinking "this whole process could be a skill file that I hand to my agent," you're right. The step-by-step above is structured the same way: gather inputs, create a folder structure, write content following a template, validate, test.&lt;/p&gt;

&lt;p&gt;So I made one. &lt;a href="https://gist.github.com/mikegc-aws/e42dbc17e15575213be846a4e7a3a495" rel="noopener noreferrer"&gt;Here's a &lt;code&gt;create-api-docs&lt;/code&gt; skill&lt;/a&gt; you can drop into your project. Install it and your agent can scaffold a complete Context Hub submission for any API. You give it the API name, the package version, and your source material, and it produces a DOC.md that follows all the conventions above.&lt;/p&gt;

&lt;p&gt;But, and I can't stress this enough, &lt;strong&gt;don't just run the skill and submit the PR.&lt;/strong&gt; The entire point of Context Hub is accuracy, because agents get APIs wrong. If you let an agent generate your docs unchecked, you're feeding the same problem back into the system. The skill gives you a solid first draft and the right structure. You still need to read every line, verify every code example, and check every parameter against your actual API. Your users are trusting these docs to produce correct code. That's a responsibility, not something to automate away entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Bother?
&lt;/h2&gt;

&lt;p&gt;Every time a coding agent hallucinates your API, that's a developer who just had a bad experience with your product. They'll spend 20 minutes debugging code that was wrong from the start, and they'll associate that friction with your API, not with the agent.&lt;/p&gt;

&lt;p&gt;Context Hub gives you a way to fix that at the source. Write the docs once, in a format agents can consume, and every agent that uses the registry gets it right.&lt;/p&gt;

&lt;p&gt;The community only found this project five days ago, and it's already at 1,500+ stars, 164 forks, and getting commits daily. The window where your submission is one of the first 100 is closing. If your API or framework isn't in Context Hub yet, go add it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @aisuite/chub
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start there.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>productivity</category>
    </item>
    <item>
      <title>When the Model Is the Machine</title>
      <dc:creator>Mike Chambers</dc:creator>
      <pubDate>Tue, 03 Mar 2026 08:49:34 +0000</pubDate>
      <link>https://forem.com/aws/when-the-model-is-the-machine-25g4</link>
      <guid>https://forem.com/aws/when-the-model-is-the-machine-25g4</guid>
      <description>&lt;p&gt;I want to show you something. A translation app. Clean interface, language selector, input field, a translate button. You type a phrase, pick a target language, hit translate, and the result appears. It works. It feels like an app.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh1ogjkm0m13zctg83vet.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh1ogjkm0m13zctg83vet.jpg" alt="Generated language tool" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So... thirty seconds before I took this screenshot, the app didn't exist. In fact... There was no codebase. No repository. No designer mocked it up, no developer wrote it, no CI pipeline deployed it. Yes, you guessed right... An AI agent generated it — the layout, the styling, the interaction logic — at runtime, in response to a single prompt typed into a URL bar.&lt;/p&gt;

&lt;p&gt;The agent that built this has exactly two tools. One generates an HTML page. The other tells the browser to swap out pieces of the DOM. That's it. There is no framework underneath, no component library, no state management system. The model holds the state. The model decides what to render. The model &lt;em&gt;is&lt;/em&gt; the application... or maybe it &lt;em&gt;is&lt;/em&gt; the execution environment?&lt;/p&gt;

&lt;p&gt;Yes, this ultimate &lt;strong&gt;everything&lt;/strong&gt; app is a party trick, and I want to be upfront about that. It's slow. It's a concept. But I think it points at something real, and I've spent the last few weeks trying to articulate what that something is.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Want to play? The code is here: &lt;a href="https://github.com/mikegc-aws/www4" rel="noopener noreferrer"&gt;github.com/mikegc-aws/www4&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  A Trillion Dollars of Doubt
&lt;/h2&gt;

&lt;p&gt;Some numbers first.&lt;/p&gt;

&lt;p&gt;In early 2026, the broader software sector fell roughly 29 percent. A &lt;a href="https://www.reuters.com/business/media-telecom/global-software-stocks-hit-by-anthropic-wake-up-call-ai-disruption-2026-02-04/" rel="noopener noreferrer"&gt;Reuters analysis&lt;/a&gt; put the total market value lost at nearly a trillion dollars. Salesforce dropped about 27 percent year-to-date; Workday fell roughly 40 percent after issuing a sluggish revenue outlook (&lt;a href="https://www.reuters.com/business/workday-tumbles-dour-revenue-outlook-amid-ai-threat-2026-02-25/" rel="noopener noreferrer"&gt;Reuters&lt;/a&gt;). Forbes ran the headline "&lt;a href="https://www.forbes.com/sites/donmuir/2026/02/04/300-billion-evaporated-the-saaspocalypse-has-begun/" rel="noopener noreferrer"&gt;The SaaS-Pocalypse Has Begun&lt;/a&gt;." The Guardian asked whether the market was headed toward a "&lt;a href="https://www.theguardian.com/australia-news/2026/feb/21/what-would-share-stock-market-saaspocalypse-mean-saas-apocalypse-meaning" rel="noopener noreferrer"&gt;SaaS-pocalypse&lt;/a&gt;." Private equity firms started circling, &lt;a href="https://m.economictimes.com/tech/technology/pe-firms-up-saas-buyouts-as-ai-resets-valuations-biz-models/articleshow/128922439.cms" rel="noopener noreferrer"&gt;buying up SaaS companies&lt;/a&gt; at reset valuations.&lt;/p&gt;

&lt;p&gt;The dominant narrative was that AI is about to eat traditional SaaS. Anthropic launched tools aimed at legal workflows, and &lt;a href="https://www.thetimes.com/business/companies-markets/article/relx-loses-over-6bn-in-value-as-anthropic-launches-legal-work-ai-t9w2k0j58" rel="noopener noreferrer"&gt;RELX lost over six billion pounds in value&lt;/a&gt; in a single session. Business Insider declared that "&lt;a href="https://www.businessinsider.com/software-ate-world-now-ai-eating-software-saas-anthropic-2026-2" rel="noopener noreferrer"&gt;Software ate the world. Now AI is eating software.&lt;/a&gt;" The fear was specific: if AI agents can orchestrate workflows directly, seat-based subscriptions lose their logic. Why pay per-user for five tools when one agent delivers the outcome?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: Markets overshoot on fear the same way they overshoot on hype, and there were plenty of non-AI factors in the mix — enterprise budgets tightening, years of inflated valuations correcting, subscription growth already slowing before any agent entered the picture. Salesforce's CEO &lt;a href="https://www.ft.com/content/b74b8227-d7cb-4976-ba95-a3a27b79cbdd" rel="noopener noreferrer"&gt;publicly dismissed&lt;/a&gt; the "SaaSpocalypse" framing. Bank of America called the selloff "&lt;a href="https://fortune.com/2026/02/04/why-saas-stocks-tech-selloff-freefall-like-deepseek-2025-overblown-paradox-irrational/" rel="noopener noreferrer"&gt;overblown&lt;/a&gt;." A &lt;a href="https://www.bain.com/insights/why-saas-stocks-have-dropped-and-what-it-signals-for-softwares-next-chapter/" rel="noopener noreferrer"&gt;Bain analysis&lt;/a&gt; argued that the drop signals a transition, not an extinction.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I don't think the selloff proves that AI will displace SaaS. But I think it surfaces a question worth taking seriously: &lt;em&gt;what happens to the value of prebuilt software when the effecacy of generated software keeps rising?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That question has two dimensions, and they're often conflated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where SaaS Goes From Here
&lt;/h3&gt;

&lt;p&gt;The first dimension is &lt;strong&gt;AI-assisted development&lt;/strong&gt;. Developers using coding assistants ship faster — &lt;a href="https://www.itransition.com/software-development/statistics" rel="noopener noreferrer"&gt;85 percent now use some form of AI tooling&lt;/a&gt;. This makes software cheaper to produce, which puts pressure on vendors who charge premium prices for what is, at its core, code running on servers. If the cost of building software falls, the price customers will pay for it falls too. This is deflationary, but it's evolutionary. It's the same dynamic that played out when cloud infrastructure commoditized hosting, or maybe when open-source commoditized databases. SaaS companies adapted before. Many will adapt again.&lt;/p&gt;

&lt;p&gt;The second dimension is different in kind. &lt;strong&gt;AI agents don't just help build software — they can replace the need for it.&lt;/strong&gt; If a sales team uses an agent that can pull data from a CRM, draft an email, update a pipeline, and generate a forecast, the question stops being "which SaaS tool should we buy?" and becomes "do we need the tool at all?" The value shifts from the software to the outcome. From the seat to the result. Pricing models are already following — &lt;a href="https://www.pymnts.com/news/artificial-intelligence/2026/ai-moves-saas-subscriptions-consumption/" rel="noopener noreferrer"&gt;PYMNTS reports&lt;/a&gt; a shift from subscription to usage-based pricing, and IT Pro has coined "&lt;a href="https://www.itpro.com/technology/artificial-intelligence/what-is-outcome-as-agentic-solution-oaas" rel="noopener noreferrer"&gt;Outcome as Agentic Solution&lt;/a&gt;" as a category.&lt;/p&gt;

&lt;p&gt;This second dimension is the one the market is pricing in. Not because it's happening everywhere today, but because the trajectory raises the question. And my little party trick demo — an application that didn't exist until someone asked for it — is a small, imperfect illustration of what that trajectory might look like at the limit.&lt;/p&gt;

&lt;p&gt;Project this forward five years. Bain's read — transition, not extinction — is probably closer to right. But I'd expect the restructuring to be significant. The companies that thrive will be those that embed AI deeply enough to become platforms for agent-driven workflows rather than destinations for human users clicking through interfaces. The ones that don't will face the same pressure the on-premise vendors felt a decade ago: not a sudden death, but a slow loss of relevance as the world builds around them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Inside the Machine
&lt;/h2&gt;

&lt;p&gt;Let's go back to my party trick translation app. I want to walk through what actually happens when you load that page, because the architecture is much simpler than you might expect, and herein reveals something about where software might be going.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fscagqb9kdnbmmbbm9a21.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fscagqb9kdnbmmbbm9a21.png" alt="www4 Architecture Diagram" width="800" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's a single Python file — about 550 lines — running a standard library HTTP server. When you visit the URL with a prompt (say, &lt;code&gt;/?prompt=language+translation+app&lt;/code&gt;), the server does three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It creates a unique session and assigns it an AI agent — specifically an agent built with &lt;a href="https://strandsagents.com/latest/" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt; and Claude Opus 4.6 running via Amazon Bedrock. (I work for AWS, and that's the way I roll.)&lt;/li&gt;
&lt;li&gt;It serves a lightweight shell page: an empty &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt;, a loading spinner, and a block of vanilla JavaScript that knows how to receive and render content.&lt;/li&gt;
&lt;li&gt;The shell page immediately fires a POST request back to the server with the session ID and the prompt.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agent then does what it does. It reads the prompt — "Generate a page for: language translation app" — and calls its first tool: &lt;code&gt;render_page&lt;/code&gt;. This tool takes three arguments: a title, HTML body content, and CSS. The agent generates all three. The server returns them as JSON. The shell page injects the HTML into the DOM and the CSS into a style tag. The spinner fades out. The app appears.&lt;/p&gt;

&lt;p&gt;Here is the &lt;code&gt;render_page&lt;/code&gt; tool in its entirety:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;render_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;css&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;invocation_state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;render&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;css&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;css&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Page rendered successfully.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Nothing else. There is no template engine, no component tree, no virtual DOM. The model produces the markup directly.&lt;/p&gt;

&lt;p&gt;Now you interact. You type a phrase, select a language, click translate. The shell page captures that event through plain event delegation — it listens for form submissions, button clicks, link clicks, checkbox toggles, select changes, and enter keypresses. It formats the event into a structured message (e.g., &lt;code&gt;FORM SUBMIT [translate-form]: {"text": "hello", "language": "Spanish"}&lt;/code&gt;) and sends it back to the server.&lt;/p&gt;

&lt;p&gt;The agent receives this message as the next turn in its conversation. It has full context — it knows what page it generated, what elements exist, what IDs they have. It decides how to respond. For a translation result, it probably calls the second tool: &lt;code&gt;update_elements&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_elements&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;updates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;invocation_state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;updates&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;updates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Elements updated successfully.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tool takes a list of &lt;code&gt;{id, html}&lt;/code&gt; pairs and tells the frontend to replace the innerHTML of each element by ID. The result area updates. The rest of the page stays put. It's a targeted DOM patch, decided entirely by the model.&lt;/p&gt;

&lt;p&gt;So... &lt;strong&gt;there is no application state anywhere except inside the model's context window.&lt;/strong&gt; The agent doesn't write to a database. It doesn't set session variables. It doesn't maintain a state object. The conversation &lt;em&gt;is&lt;/em&gt; the state. Every interaction adds a turn, and the model reasons over the full history to decide what to render next.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Would Make It Better
&lt;/h3&gt;

&lt;p&gt;Its limitations with todays models are obvious. Every interaction requires a full round trip to the model, which means latency measured in seconds rather than milliseconds. The agent regenerates its understanding of the page from conversation context on every turn, which is wasteful. There's no persistence — close the tab and the session is gone.&lt;/p&gt;

&lt;p&gt;Some of its issues are solvable these problems are within the same paradigm. Point being that this isn't even as good as it could be:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server-side DOM cache.&lt;/strong&gt; The most obvious improvement. Instead of relying entirely on the model's context to remember the current page state, the server could maintain a representation of the DOM. The agent could diff against it, and the server could validate that element IDs in &lt;code&gt;update_elements&lt;/code&gt; calls actually exist. This would reduce errors and allow the model to work with a smaller context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming generation.&lt;/strong&gt; Right now the agent generates the full HTML payload, then the server sends it. A streaming approach — where HTML is sent to the browser as the model produces it — would dramatically improve perceived performance. The page could progressively render, much like a server-side rendered page loading over a slow connection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid rendering.&lt;/strong&gt; Some interactions don't need the model at all. A dropdown menu opening, a tab switching between already-generated content, a tooltip appearing — these could be handled by lightweight generated client-side JavaScript, with the agent only invoked for decisions that require reasoning. The system prompt could instruct the agent to include specific interaction patterns that the shell page handles natively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt caching and prefills.&lt;/strong&gt; For common application patterns — a form, a data table, a navigation layout — the agent could work from cached partial outputs rather than generating from scratch every time. The model is doing redundant work if it invents a fresh CSS reset for every page.&lt;/p&gt;

&lt;p&gt;None of these improvements change the fundamental architecture. The model remains the runtime. The application remains ephemeral. But the experience moves from "party trick" to something you could genuinely use.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real World Is Already Moving
&lt;/h2&gt;

&lt;p&gt;My demo is a solo experiment, but the ideas behind it are showing up in serious, projects. Two in particular are worth watching.&lt;/p&gt;

&lt;h3&gt;
  
  
  AG-UI: A Protocol for Agent-Driven Interfaces
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/ag-ui-protocol/ag-ui" rel="noopener noreferrer"&gt;AG-UI protocol&lt;/a&gt; emerged from CopilotKit's work and has since attracted first-party support from Microsoft, Google, AWS, and others. It's an open, event-based protocol that standardizes how AI agents connect to frontend applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fidhm26gfj5vowqgmue5i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fidhm26gfj5vowqgmue5i.png" alt="AG-UI" width="800" height="340"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Image source: &lt;a href="https://github.com/ag-ui-protocol/ag-ui" rel="noopener noreferrer"&gt;AG-UI Protocol&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The core insight is that agents need a structured way to communicate with UIs that goes beyond dumping text into a chat window. AG-UI defines roughly 16 event types that an agent backend can emit — events that represent things like "update this piece of state," "render this component," or "request human input before proceeding." The frontend listens for these events and renders accordingly.&lt;/p&gt;

&lt;p&gt;This is the same pattern as my demo, but formalized and generalized. Where www4 has two hand-rolled tools and a bespoke shell page, AG-UI provides a protocol layer that lets any agent framework talk to any frontend. It supports SSE, WebSockets, and webhooks for transport. It includes middleware for loose format matching, so that agents built in different frameworks can connect without perfect specification compliance.&lt;/p&gt;

&lt;p&gt;AG-UI sits in a deliberate position in an emerging stack. MCP (Model Context Protocol) gives agents access to tools. A2A (Agent-to-Agent) lets agents communicate with each other. AG-UI brings agents into the user interface. And now MCP itself is moving toward UI as a first-class concept — the &lt;a href="https://blog.modelcontextprotocol.io/posts/2025-11-21-mcp-apps/" rel="noopener noreferrer"&gt;MCP Apps specification&lt;/a&gt; defines a standard for tools to declare UI resources alongside their capabilities, rendered in sandboxed iframes with structured communication back to the host. The specification's own language is telling: it describes an "agentic app runtime." Together, these protocols describe a world where applications are assembled from agent capabilities rather than compiled from source code.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP-Use: From Protocol to Interface
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/mcp-use/mcp-use-ts" rel="noopener noreferrer"&gt;mcp-use-ts&lt;/a&gt; project takes a different but complementary approach — and is emerging as one of the first concrete implementations of the &lt;a href="https://blog.modelcontextprotocol.io/posts/2025-11-21-mcp-apps/" rel="noopener noreferrer"&gt;MCP Apps extension&lt;/a&gt;. Where AG-UI standardizes the protocol between agents and UIs, mcp-use-ts focuses on making it trivially easy to build interfaces on top of MCP servers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhoslzhrp01hkkmqlr2m3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhoslzhrp01hkkmqlr2m3.png" alt="mcp-use-ts" width="800" height="319"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Image source: &lt;a href="https://github.com/mcp-use/mcp-use-ts/tree/main" rel="noopener noreferrer"&gt;mcp-use-ts&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Its most interesting feature is automatic UI generation from tool definitions. You define an MCP tool with a Zod schema — say, a tool that queries a database with parameters for table name, date range, and output format — and the framework generates an interactive form for that tool automatically. No frontend code needed.&lt;/p&gt;

&lt;p&gt;But it goes further than auto-generated forms. Developers can build custom React widgets that are served alongside MCP tools as resources. This means an MCP server can expose not just capabilities but also the interfaces for using those capabilities. A database MCP server could come with a query builder UI. A monitoring MCP server could include a dashboard.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;create-mcp-use-app&lt;/code&gt; scaffolding tool lets you go from zero to a working application with an MCP backend and an auto-generated frontend in under a minute. The development server includes hot reload for both tools and widgets, and a built-in inspector that functions as both a debugging tool and a prototype UI.&lt;/p&gt;

&lt;p&gt;This is relevant because it represents a concrete, usable step toward the architecture my demo illustrates. Instead of hand-writing a frontend and a backend and the glue between them, you define capabilities and let the tooling generate the interface. The "application" is a thin layer over agent capabilities, generated rather than authored.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Frontier
&lt;/h2&gt;

&lt;p&gt;Is this post long enough for a conclusion?? Let me pull these threads together.&lt;/p&gt;

&lt;p&gt;My demo shows an agent that generates a complete, interactive application at runtime from a single prompt. It's slow and it's limited, but it works. The model acts as the runtime, the state engine, and the decision-maker. There is no application until someone asks for one, and then there is exactly the application they asked for.&lt;/p&gt;

&lt;p&gt;The SaaS selloff shows a market waking up to the possibility that static, prebuilt software is losing its premium. When agents can orchestrate outcomes directly, the value of the intermediary tools drops. The companies that survive will be those that become platforms for agent-driven work, not destinations for human-driven workflows.&lt;/p&gt;

&lt;p&gt;AG-UI shows the infrastructure layer forming — a standardized way for agents to control user interfaces, backed by the biggest names in the industry. MCP-Use shows the developer experience becoming real — tools that let you go from capability definition to working UI without writing frontend code.&lt;/p&gt;

&lt;p&gt;These are all points on the same line. And the line points toward a future where software is generated, not built. Where applications are ephemeral, not persistent. Where the currency is the idea — "I need a translation app," "show me a sales dashboard," "build me a tool that tracks my inventory" — and the execution of that idea is handled in real time by models that understand what you want and can produce it on demand.&lt;/p&gt;

&lt;p&gt;This isn't a prediction about next quarter. The models are still too slow for production use in this mode. Context windows, while generous, still impose limits on session complexity. The quality of generated interfaces, while impressive, isn't yet reliable enough for mission-critical work. These are real constraints.&lt;/p&gt;

&lt;p&gt;But every one of them is improving on a curve that software engineers will recognize. Models are getting faster. Context windows are growing. Output quality is climbing. The gap between "interesting demo" and "production system" is closing on a timeline measured in years, not decades.&lt;/p&gt;

&lt;h3&gt;
  
  
  What This Means for Builders
&lt;/h3&gt;

&lt;p&gt;I want to be clear: this is not a story about developers becoming obsolete. It's a story about what developers &lt;em&gt;do&lt;/em&gt; shifting.&lt;/p&gt;

&lt;p&gt;If the value of writing code is falling — and it is, by every measure — then the value of knowing &lt;em&gt;what to build&lt;/em&gt; is rising. Architecture, domain expertise, problem framing, verification, orchestration. These are the skills that appreciate as the cost of execution drops toward zero.&lt;/p&gt;

&lt;p&gt;The developers who thrive in this landscape will be the ones who stop thinking of themselves as people who write software and start thinking of themselves as people who &lt;em&gt;define intent and verify outcomes&lt;/em&gt;. The code becomes the cheapest part. The thinking becomes the most expensive.&lt;/p&gt;

&lt;p&gt;And there's a broader opportunity here, one that goes beyond the software industry. If generating an application becomes as easy as describing what you want, then the bottleneck on innovation moves from "can we build it?" to "can we imagine it?" The currency becomes ideas. Good ones, specific ones, ones grounded in real problems and real domain knowledge.&lt;/p&gt;

&lt;p&gt;That's not a future to fear. It's a future to build toward.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The www4 project referenced in this post is available at &lt;a href="https://github.com/mikegc-aws/www4" rel="noopener noreferrer"&gt;github.com/mikegc-aws/www4&lt;/a&gt;. The AG-UI protocol is at &lt;a href="https://github.com/ag-ui-protocol/ag-ui" rel="noopener noreferrer"&gt;github.com/ag-ui-protocol/ag-ui&lt;/a&gt;. MCP-Use is at &lt;a href="https://github.com/mcp-use/mcp-use-ts" rel="noopener noreferrer"&gt;github.com/mcp-use/mcp-use-ts&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>softwaredevelopment</category>
      <category>python</category>
    </item>
    <item>
      <title>Software 3.1? - AI Functions</title>
      <dc:creator>Mike Chambers</dc:creator>
      <pubDate>Tue, 24 Feb 2026 06:19:23 +0000</pubDate>
      <link>https://forem.com/aws/software-31-ai-functions-5acn</link>
      <guid>https://forem.com/aws/software-31-ai-functions-5acn</guid>
      <description>&lt;h3&gt;
  
  
  Watch: AI Functions Deep Dive:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://youtu.be/ggWaZO13onc" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1g6evseeqi9w388ds7cb.jpg" alt="AI Functions Deep Dive" width="800" height="450"&gt;&lt;/a&gt;                             &lt;/p&gt;

&lt;p&gt;Andrej Karpathy has a version numbering scheme for how software gets written. Software 1.0 is code written by humans. Software 2.0 is neural network weights learned through optimization. Software 3.0 is prompting LLMs in plain language, and sounds nicer than calling it vibe coding, which, fun-fact is a also a Karpathy coined term.&lt;/p&gt;

&lt;p&gt;Of course, Software 3.0 is real. Millions of people are using it daily. Tools like Kiro, Cursor, Claude Code, and ChatGPT let you describe what you want and get code back. Karpathy &lt;a href="https://www.ycombinator.com/library/MW-andrej-karpathy-software-is-changing-again" rel="noopener noreferrer"&gt;emphasizes&lt;/a&gt; a ‘generation–verification loop’ in partial-autonomy tools: the model generates changes, a human verifies them, and the work iterates.&lt;/p&gt;

&lt;p&gt;But there's something more fundamental going on than who reviews what. Look at what the LLM actually produces in Software 3.0: text. Code as strings. JSON payloads. Markdown documents. The model generates, you receive text, and then &lt;em&gt;you&lt;/em&gt; do everything else -- integrate it into your codebase, write tests, run CI, deploy. If you're disciplined about verification, you write test cases, but those run before deployment. Once the code ships, the tests don't execute again. The LLM's involvement ends when it hands you the output. Your running software has no relationship with the model that helped write it.&lt;/p&gt;

&lt;p&gt;Now consider a different arrangement. The LLM generates code that actually &lt;em&gt;runs&lt;/em&gt; inside your application -- at call time, every time the function is invoked. It returns native Python objects -- DataFrames, Pydantic models, database connections -- not JSON strings you have to parse. And verification isn't a gate you pass before deployment; it's post-conditions that execute on every call, feeding failures back to the model for automatic retries. This changes three things at once: &lt;em&gt;where&lt;/em&gt; AI fits in your software (runtime, not just development time), &lt;em&gt;what&lt;/em&gt; it produces (live objects you can call methods on, not serialized text), and &lt;em&gt;how&lt;/em&gt; you trust it (continuous automated verification, not one-time human review).&lt;/p&gt;

&lt;p&gt;That's the experiment at the heart of &lt;a href="https://github.com/strands-labs/ai-functions" rel="noopener noreferrer"&gt;AI Functions&lt;/a&gt;, a new project from Strands Labs built on the &lt;a href="https://github.com/strands-agents/sdk-python" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt;. You write a Python function with a natural language specification instead of implementation code. You attach post-conditions -- plain Python assertions that define what correct output looks like. When the function is called, the LLM generates code, executes it in your Python process, returns the result as a native object, and the post-conditions verify it. If verification fails, the system retries with the error as feedback. The human never inspects the generated code. The post-conditions do the inspecting -- every time.&lt;/p&gt;

&lt;p&gt;If Software 3.0 is "human prompts, LLM generates, human verifies," then I propose that AI Functions are &lt;strong&gt;Software 3.1: human specifies, LLM generates and executes, machine verifies -- at runtime.&lt;/strong&gt; Same paradigm -- natural language as the programming interface. But the execution model is different. The LLM isn't producing text for a human to integrate. It's producing code that runs, returning objects your application uses directly, verified by post-conditions on every call. Software 3.1 is a "point release," not a major version bump. The upgrade is in what happens after generation.&lt;/p&gt;

&lt;p&gt;This post is a deep dive into what AI Functions are, how they work, and what automated verification makes possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI Functions Are
&lt;/h2&gt;

&lt;p&gt;AI Functions is built on top of the &lt;a href="https://github.com/strands-agents/sdk-python" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt;, an open-source framework for building AI agents. AI Functions introduces a single core abstraction: the &lt;code&gt;@ai_function&lt;/code&gt; decorator. You write a Python function with a natural language specification instead of an implementation body. When the function is called, an LLM generates the implementation, executes it, and returns the result. Optionally -- and this is the important part -- you attach post-conditions that validate the output and trigger automatic retries if validation fails.&lt;/p&gt;

&lt;p&gt;The simplest example looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ai_function&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;translate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Translate the text below to the following language: {lang}.
    {text}
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;translate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The quarterly results exceeded expectations.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;French&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You call &lt;code&gt;translate_text&lt;/code&gt; like any Python function. The decorator intercepts the call, constructs a prompt from the docstring (substituting the arguments), sends it to an LLM, and returns the result as a typed Python string. From the caller's perspective, it's just a function that takes a string and returns a string. The fact that an LLM executed it is an implementation detail.&lt;/p&gt;

&lt;p&gt;This on its own is still kinda Software 3.0 -- prompt in, result out. It's pleasant, but it isn't where AI Functions get interesting. They get interesting when you add structure, validation, code execution, multi-agent composition, and async workflows. That's where 3.1 begins.&lt;/p&gt;

&lt;h2&gt;
  
  
  Structured Output with Pydantic
&lt;/h2&gt;

&lt;p&gt;AI Functions can return arbitrary typed objects, not just strings. When you specify a Pydantic model as the return type, the framework enforces schema compliance automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ai_function&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MeetingSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;attendees&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;key_decisions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;action_items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;summarize_meeting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MeetingSummary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Summarize the following meeting transcript in less than 50 words.
    &amp;lt;transcript&amp;gt;
    {transcript}
    &amp;lt;/transcript&amp;gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Call &lt;code&gt;summarize_meeting(transcript)&lt;/code&gt; and you get back a &lt;code&gt;MeetingSummary&lt;/code&gt; object with typed fields, IDE autocomplete, and Pydantic's built-in validation. The LLM's output is parsed into the Pydantic model, and if the structure doesn't match, the framework handles the retry. From the caller's perspective, the function returns a typed Python object.&lt;/p&gt;

&lt;p&gt;This is a pattern that frameworks like &lt;a href="https://python.useinstructor.com/" rel="noopener noreferrer"&gt;Instructor&lt;/a&gt; have established. AI Functions' contribution isn't in structured output itself -- it's in how structured output composes with everything else in the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Post-Conditions
&lt;/h2&gt;

&lt;p&gt;Post-conditions are the core of what makes AI Functions more than a prompting framework. A post-condition is a Python function that validates the output of an AI Function. If validation fails, the error message feeds back to the LLM, which retries. Multiple post-conditions run in parallel, so the LLM receives all failure signals at once and can address them in a single retry.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ai_function&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PostConditionResult&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MeetingSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;attendees&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;key_decisions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;action_items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MeetingSummary&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key_decisions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Key decisions should total under 50 words, got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_quality&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MeetingSummary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;PostConditionResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Check if the meeting summary below satisfies the following criteria:
    - Key decisions must be specific and actionable, not vague
    - Action items must each name a responsible person
    &amp;lt;decisions&amp;gt;{response.key_decisions}&amp;lt;/decisions&amp;gt;
    &amp;lt;actions&amp;gt;{response.action_items}&amp;lt;/actions&amp;gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post_conditions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;check_length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;check_quality&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;summarize_meeting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MeetingSummary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Summarize the following meeting transcript in less than 50 words.
    &amp;lt;transcript&amp;gt;
    {transcript}
    &amp;lt;/transcript&amp;gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two things to notice here. First, &lt;code&gt;check_length&lt;/code&gt; is a plain Python function that raises an &lt;code&gt;AssertionError&lt;/code&gt; on failure. This is a deterministic, inspectable validation -- no LLM involved, no ambiguity. Second, &lt;code&gt;check_quality&lt;/code&gt; is itself an AI Function that returns a &lt;code&gt;PostConditionResult&lt;/code&gt; -- a Pydantic model with &lt;code&gt;passed&lt;/code&gt; (bool) and &lt;code&gt;message&lt;/code&gt; (str) fields. It uses an LLM to evaluate whether the summary meets quality criteria that are hard to express as assertions -- specificity, actionability, attribution. An AI Function validating another AI Function. The framework treats both identically: if either fails, the error goes back to the generating LLM as feedback.&lt;/p&gt;

&lt;p&gt;This creates a self-correcting loop. The generating LLM doesn't need to get it right on the first attempt. It needs to be able to improve given specific feedback about what went wrong. In practice, this means the developer's job shifts from crafting perfect prompts to writing good post-conditions -- a fundamentally different skill.&lt;/p&gt;

&lt;p&gt;Of course, we need to understand what is happening here, and that this also opens us up to code retry loops "hidden" in our projects! We will need to ensure we have solid monitoring, and observability before we lean on this too hard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Returning Native Python Objects
&lt;/h2&gt;

&lt;p&gt;Most LLM frameworks force output through JSON serialization. AI Functions can return non-serializable Python objects -- DataFrames, SymPy expressions, database connections, anything -- because the generated code runs in the same Python interpreter as your application.&lt;/p&gt;

&lt;p&gt;This is the feature that makes AI Functions feel qualitatively different from other frameworks. Consider a format-agnostic data loader that handles purchase records regardless of how they're stored:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ai_function&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_invoice_dataframe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Post-condition: validate DataFrame structure.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;product_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;purchase_date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;issubset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_integer_dtype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quantity must be an integer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_float_dtype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price must be a float&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_datetime64_any_dtype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;purchase_date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;duplicated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;product_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;purchase_date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;code_execution_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;code_executor_additional_imports&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pandas.*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlite3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;post_conditions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;check_invoice_dataframe&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;import_invoice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    The file `{path}` contains purchase logs. Extract them in a DataFrame with columns:
    - product_name (str)
    - quantity (int)
    - price (float)
    - purchase_date (datetime)
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Call &lt;code&gt;import_invoice('data/orders.json')&lt;/code&gt; and you get back an actual Pandas DataFrame -- not a JSON representation of one, not a serialized string, a real DataFrame object that you can immediately call &lt;code&gt;.describe()&lt;/code&gt;, &lt;code&gt;.groupby()&lt;/code&gt;, or &lt;code&gt;.plot()&lt;/code&gt; on. Hand it a SQLite file instead and the same function inspects the database schema, writes the appropriate SQL queries, and returns the same validated DataFrame structure.&lt;/p&gt;

&lt;p&gt;The developer writes zero format-specific parsing logic. The natural language specification says what the output should contain. The post-conditions verify structural invariants. The LLM figures out how to get from an opaque file to a validated DataFrame, dynamically, at call time.&lt;/p&gt;

&lt;p&gt;This works because the framework provides the LLM with a Python executor tool that shares the same runtime as the calling code. The LLM generates Python code, executes it inside your process, and returns the result object directly. No serialization round-trip. The &lt;code&gt;code_execution_mode="local"&lt;/code&gt; parameter is an explicit opt-in -- the framework doesn't run arbitrary generated code by default, and you declare which imports are allowed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code Execution and the Trust Model
&lt;/h2&gt;

&lt;p&gt;The code execution model deserves closer attention because it reveals AI Functions' deliberate approach to trust.&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;code_execution_mode="local"&lt;/code&gt; is enabled, the LLM can generate and execute Python code within your interpreter. This is powerful -- it's what enables returning DataFrames, running computations, and interacting with the local environment. It's also a security surface. The framework mitigates this through several mechanisms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Explicit opt-in.&lt;/strong&gt; Code execution is off by default. You must enable it per function.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Import restrictions.&lt;/strong&gt; &lt;code&gt;code_executor_additional_imports&lt;/code&gt; explicitly declares which packages the generated code may use. Anything not listed is unavailable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Post-condition verification.&lt;/strong&gt; The output is validated regardless of how it was produced. Even if the generated code takes an unexpected path, the post-conditions catch invalid results.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the honest assessment is that this is a tradeoff. You're executing LLM-generated code in your process. The framework uses AST-based validation of generated code with controlled imports and timeouts, which attempts to prevent malicious imports and block dangerous operations. But this doesn't offer true sandboxing and doesn't prevent resource exhaustion (infinite loops, excessive memory allocation). For an experiment, with appropriate constraints, this is a reasonable choice. For production workloads, the project recommends running AI Functions inside a container or other isolated environment to provide process-level isolation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Agent Composition
&lt;/h2&gt;

&lt;p&gt;Results from AI Functions compose naturally through regular Python. Since AI Functions return typed objects, you chain them the same way you chain any functions -- by passing outputs as inputs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ai_function&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code_execution_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code_executor_additional_imports&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pandas.*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_sales_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Load the sales data from `{path}` and compute a summary DataFrame
    with total revenue, average order value, and top 5 products by volume.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_executive_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;financials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Write a concise executive summary for {company} highlighting key trends
    and recommendations based on the provided financial data.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;financials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;analyze_sales_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data/q4_sales.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;write_executive_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Acme Corp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;financials&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Top Products:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;financials&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summary:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is just ordinary function composition. The first function returns a DataFrame; the second takes a DataFrame as input. No special state-passing machinery needed.&lt;/p&gt;

&lt;p&gt;For more complex workflows, AI Functions can be used as &lt;em&gt;tools&lt;/em&gt; by other agents, enabling orchestration patterns where a coordinator delegates to specialized sub-agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ai_function&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_functions.types&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PostConditionResult&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search the web for a topic and return a cited summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;websearch_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;post_conditions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;check_length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;check_citations&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_words&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Perform a web search on the following topic and return a summary.
    Every claim must be supported by citations to sources.
    &amp;lt;query&amp;gt;{query}&amp;lt;/query&amp;gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Suggest the plan and organization of a report&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;websearch_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;report_planner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ReportPlan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate a plan to write a report on: {topic}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;report_planner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_section&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;report_orchestrator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Write a report on the following topic: {topic}
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator sees &lt;code&gt;report_planner&lt;/code&gt;, &lt;code&gt;search_agent&lt;/code&gt;, and &lt;code&gt;report.add_section&lt;/code&gt; as tools it can call. Each sub-agent runs with its own post-conditions, so the orchestrator receives validated results. The search agent's citations are verified before its results reach the orchestrator. This creates a hierarchy of validated agents -- post-conditions compose across the multi-agent system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Async Execution and Parallel Workflows
&lt;/h2&gt;

&lt;p&gt;AI Functions can be defined as &lt;code&gt;async&lt;/code&gt;, which enables parallel execution of independent tasks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ai_function&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;websearch_tool&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;research_market&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Research and summarize the competitive landscape and recent news for: {company}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code_execution_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code_executor_additional_imports&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pandas.*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yfinance.*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_financial_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Use the `yfinance` Python package to retrieve the historical prices of {stock}
    in the last 30 days. Return a DataFrame with columns [date, price].
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code_execution_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code_executor_additional_imports&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pandas.*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;plotly.*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_investment_memo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;financials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Write an investment memo for {company}. Use the market research and financial data:
    {research}
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;due_diligence_workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;financials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nf"&gt;research_market&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nf"&gt;load_financial_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;write_investment_memo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;financials&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The two tasks run concurrently. Since they're independent -- one searches the web, one loads and transforms local data -- parallelism gives you the same results in roughly half the wall-clock time with no additional cost. The results then feed into a synchronous report writer that uses both.&lt;/p&gt;

&lt;p&gt;Notice the &lt;code&gt;tools=[websearch_tool]&lt;/code&gt; parameter. AI Functions can use any &lt;a href="https://github.com/strands-agents/tools" rel="noopener noreferrer"&gt;Strands tool&lt;/a&gt;. The framework provides built-in tools for Python code execution, and you can pass additional tools (web search, API clients, file I/O) per function. The LLM decides when and how to use them during execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration Sharing
&lt;/h2&gt;

&lt;p&gt;Different parts of a workflow may need different models. A quick validation check doesn't need the same model as a complex analysis. AI Functions use &lt;code&gt;AIFunctionConfig&lt;/code&gt; objects to share configuration across functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ai_function&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIFunctionConfig&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Configs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;BIG_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AIFunctionConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.anthropic.claude-sonnet-4-5-20250929-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;FAST_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AIFunctionConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.anthropic.claude-haiku-4-5-20251001-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;DATA_ANALYSIS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AIFunctionConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.anthropic.claude-sonnet-4-5-20250929-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;code_execution_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;code_executor_additional_imports&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pandas.*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;numpy.*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Configs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DATA_ANALYSIS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;normalize_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Load, clean, and normalize the dataset at `{path}` into a standard schema.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Configs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FAST_MODEL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check if the following string is a valid email address: {text}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configs are plain Python objects, so switching the entire pipeline from one model family to another is a single line change. During development, you might route everything through a capable but expensive model. For cost optimization, you swap the config's model reference and see what breaks. Keyword arguments on &lt;code&gt;@ai_function&lt;/code&gt; override config values for individual functions, so you can specialize without duplicating the entire config.&lt;/p&gt;

&lt;h2&gt;
  
  
  Validating More Than Output
&lt;/h2&gt;

&lt;p&gt;One of the more subtle capabilities of the post-condition system is validating &lt;em&gt;properties&lt;/em&gt; of a result that are hard to express as structural checks. AI-powered post-conditions let you evaluate semantic qualities -- grounding, citation quality, logical consistency -- using one LLM to validate another:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ai_function&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PostConditionResult&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_citations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;PostConditionResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Validate if all the claims made in the following summary are supported
    by an inline citation to a credible source.
    &amp;lt;summary&amp;gt;
    {summary}
    &amp;lt;/summary&amp;gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_words&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;max_words&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;websearch_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;post_conditions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;check_length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;check_citations&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;market_researcher&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_words&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Research and provide a well-sourced answer to: {query}
    Every claim must be supported by citations to credible sources.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The research agent produces a summary. &lt;code&gt;check_length&lt;/code&gt; verifies the word count deterministically. &lt;code&gt;check_citations&lt;/code&gt; uses an LLM to evaluate whether each claim is actually backed by a cited source. If the agent hallucinated an answer without doing real research, the citation check catches it and triggers a retry with feedback about specifically which claims lack sources.&lt;/p&gt;

&lt;p&gt;This is a different kind of validation from checking output structure. It's using AI to verify AI -- checking semantic properties that are hard to express as assertions. It addresses one of the hardest problems in LLM-based systems: how do you know the model didn't just make something up? Post-conditions don't solve this fully, but they create a second, independent evaluation that meaningfully reduces the failure rate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test Suites as Post-Conditions
&lt;/h2&gt;

&lt;p&gt;The post-condition model has an interesting application to automated coding: use your existing test suite as the post-condition. If the tests pass, the implementation is correct. If they fail, the failures feed back as error messages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ai_function&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;contextlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;redirect_stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redirect_stdout&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FeatureRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;test_files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Post-conditions can request original input arguments by name.
# Here, `feature` matches the parameter name of `implement_feature`.
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_tests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FeatureRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;stdio_capture&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StringIO&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;redirect_stdout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stdio_capture&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;redirect_stderr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stdio_capture&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;retcode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;test_files&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;retcode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stdio_capture&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getvalue&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post_conditions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;run_tests&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;implement_feature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FeatureRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Implement the following feature in the current code base:
    &amp;lt;feature&amp;gt;{feature.description}&amp;lt;/feature&amp;gt;
    Once done the code base should pass the following tests: {feature.test_files}
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;FeatureRequest&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;feature&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;implement_feature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI Function's return value is just the string &lt;code&gt;"done"&lt;/code&gt; -- it doesn't matter. What matters is the side effect: the code base should now pass the specified tests. The post-condition runs &lt;code&gt;pytest&lt;/code&gt; and raises if any tests fail. The LLM receives the test output as feedback and keeps iterating until all tests pass.&lt;/p&gt;

&lt;p&gt;The documentation notes that agents pass roughly 10-15% more tests when the post-condition is provided in addition to the prompt instruction. The agent is measurably more effective at responding to concrete validation failures than at following written instructions. This aligns with a broader pattern: concrete, automated feedback loops outperform detailed prompting. Which is exactly the argument for 3.1 over 3.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;AI Functions is an experiment. The code is open source at &lt;a href="https://github.com/strands-labs/ai-functions" rel="noopener noreferrer"&gt;strands-labs/ai-functions&lt;/a&gt;, part of the &lt;a href="https://github.com/strands-labs" rel="noopener noreferrer"&gt;Strands Labs GitHub organization&lt;/a&gt; -- a home for experimental projects built on the Strands Agents SDK. Alongside AI Functions, you'll find &lt;a href="https://github.com/strands-labs/robots" rel="noopener noreferrer"&gt;Robots&lt;/a&gt; (physical AI agents on edge hardware) and &lt;a href="https://github.com/strands-labs/robots-sim" rel="noopener noreferrer"&gt;Robots Sim&lt;/a&gt; (simulated environments for robot development). All three are built on the &lt;a href="https://github.com/strands-agents/sdk-python" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt;, which has been downloaded over 14 million times since its open source release in May 2025. All three are explicitly experimental -- and that's the point. The best way to find out what works in this space is to build things and see what breaks.&lt;/p&gt;

&lt;p&gt;Install it with &lt;code&gt;pip install strands-ai-functions&lt;/code&gt; (or &lt;code&gt;uv add strands-ai-functions&lt;/code&gt;), clone the &lt;a href="https://github.com/strands-labs/ai-functions" rel="noopener noreferrer"&gt;repo&lt;/a&gt; for the full set of examples, and start experimenting.&lt;/p&gt;

&lt;p&gt;AI Functions is not a production system. It's a conversation starter and just maybe where Karpathy's version numbering goes next. Try it. Write some post-conditions. See whether defining acceptance criteria feels more natural than auditing LLM output. And then consider: what does 4.0 look like?&lt;/p&gt;

&lt;p&gt;We don't know yet. But the experiments have started :)&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>programming</category>
      <category>softwaredevelopment</category>
    </item>
    <item>
      <title>Async Agentic Tools: Breaking Free from the Request-Response Loop</title>
      <dc:creator>Mike Chambers</dc:creator>
      <pubDate>Tue, 17 Feb 2026 09:16:15 +0000</pubDate>
      <link>https://forem.com/aws/async-agentic-tools-breaking-free-from-the-request-response-loop-1ll5</link>
      <guid>https://forem.com/aws/async-agentic-tools-breaking-free-from-the-request-response-loop-1ll5</guid>
      <description>&lt;p&gt;Every AI agent framework today follows the same basic loop: the model thinks, it calls some tools, it waits for all the tools to finish, and then it thinks again. On one hand that loop works fine when your tools return in seconds or when your agent is headless, but on the other it causes millions of people every day to stare blankly for hours (in total) at a "thinking" message. Where's the productivity gain in that? :)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skip to the code bit: here&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick walkthrough video here:&lt;/strong&gt; &lt;br&gt;
&lt;a href="https://youtu.be/VYLBCoxbPE8" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcderh9x8x4hhpen1q1zk.jpg" alt="Watch the video" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you've used (or built) agents that call APIs with variable latency, run database queries, kick off web searches, or - of course - used agents as tools, you've felt this. The model sits idle, the user sits idle, and a 30-second tool call holds up the response to a 3-second one that could have finished ages ago. The agent can't talk to the user, can't start processing partial results, can't do anything.&lt;/p&gt;

&lt;p&gt;Modern frontier models now have the ability to start experimenting with genuine asynchronous tool calls. This post walks through my experimental approach: true asynchronous agentic tools. The demo is built on the &lt;a href="https://github.com/strands-agents/sdk-python" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt;, but the pattern should apply to any agent framework with a tool-calling loop. The code is open, the approach is simple, and it requires zero changes to how you write your tools.&lt;/p&gt;
&lt;h2&gt;
  
  
  Async vs Async
&lt;/h2&gt;

&lt;p&gt;First, before you write your comment on this post... let's be precise about terminology.&lt;/p&gt;

&lt;p&gt;Many agent frameworks already support parallel tool calling — when a model returns multiple tool calls in a single response, the framework can execute them concurrently. Strands Agents, which I'm using for this demo, &lt;a href="https://strandsagents.com/latest/documentation/docs/user-guide/concepts/tools/executors/" rel="noopener noreferrer"&gt;supports this natively&lt;/a&gt;. That's good. But the agent loop is still blocked until every tool in that batch has responded. If you dispatch three tools and two finish in 1 second but one takes 45 seconds, the model cannot respond, cannot think, and cannot act for those 44 seconds of dead air. It's the same as &lt;code&gt;Promise.all()&lt;/code&gt; or &lt;code&gt;asyncio.gather()&lt;/code&gt; — you get concurrency in the execution, but you still wait for the slowest one before anything else can happen.&lt;/p&gt;

&lt;p&gt;True async agentic tool calling is different. The model dispatches a tool, gets back an acknowledgement immediately ("task started, here's an ID"), and moves on. It can talk to the user, call other tools, or just wait. When the result arrives — seconds, minutes, hours or days later — it gets delivered to the model as a new message, and the model processes it then. The model stays responsive. Results stream in as they complete. The user experience is fundamentally different. And I have to say, I like it!&lt;/p&gt;
&lt;h2&gt;
  
  
  Why This Wasn't Possible Until Recently
&lt;/h2&gt;

&lt;p&gt;This architecture puts real demands on a model's intelligence, its context size, and its propensity for 'lost in the middle' issues. It needs to understand that a tool call won't return a real result. It needs to avoid fabricating data while waiting. It needs to handle results arriving out-of-order, potentially many turns after the original request. And it needs to keep track of multiple pending tasks across a growing conversation context.&lt;/p&gt;

&lt;p&gt;One solution to this could be to provide the agent with even more tools to manage its own execution flow. But this just makes the flow even more complex for the poor model. &lt;/p&gt;

&lt;p&gt;The (current) solution is much simpler, ironicaly we just needed to wait for the frontier models to get good enough. The problem you couldn't solve 6 months ago, now works just like you through it would, now that Opus 4.6 is here. (Is there a name for this progression? Like Moores Law just for LLM improvment? Please let me know.)&lt;/p&gt;

&lt;p&gt;As recently as mid-2025, asynchronous tool calling didn't work reliably. Models would hallucinate results instead of waiting, lose track of pending task IDs, or get confused when results arrived in a later turn. The instruction-following just wasn't precise enough.&lt;/p&gt;

&lt;p&gt;Newer, more capable models handle kinda okay. They follow the "do not fabricate" instruction in the tool description, they correctly associate arriving results with their task IDs, and they maintain coherent multi-turn conversations while tasks are in flight. This is one of those capabilities that seems to have emerged from generally smarter models rather than from any async-specific training. Is it perfect, no, but no agwet is perfect, and I am sure it will improve.&lt;/p&gt;
&lt;h2&gt;
  
  
  How It Works (The Code)
&lt;/h2&gt;

&lt;p&gt;My implementation is three small components — about 320 lines of code total — that layer on top of a standard Strands Agent without modifying it. The code is linked here: &lt;a href="https://github.com/mikegc-aws/async-agentic-tools" rel="noopener noreferrer"&gt;repo&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  1. The Decorator: &lt;code&gt;@tool_async&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;You write your tool exactly the way you'd write any tool function. A function with a docstring and type hints. Then you wrap it with &lt;code&gt;@tool_async(manager)&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_async_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncToolManager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_async&lt;/span&gt;

&lt;span class="n"&gt;manager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncToolManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@tool_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;manager&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;research_topic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Research a topic thoroughly and return detailed findings.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# simulate slow API call
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Findings about &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So that's it for the tool, your function doesn't change. It's still synchronous. It still returns a string. The decorator handles everything for you.&lt;/p&gt;

&lt;p&gt;What it does behind the scenes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When the model calls &lt;code&gt;research_topic&lt;/code&gt;, the decorator submits the original function to a &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; for background execution.&lt;/li&gt;
&lt;li&gt;It immediately returns a structured message to the model: task ID, tool name, arguments, and a clear instruction not to fabricate the result.&lt;/li&gt;
&lt;li&gt;It appends an async notice to the tool's docstring so the model knows, from the schema alone, that this tool is asynchronous, and how to expect it to work.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model sees the immediate response and might tell the user "I've started the research." Meanwhile, the real function is running on a background thread.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Manager: &lt;code&gt;AsyncToolManager&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The manager is a thin wrapper around Python's &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; that adds task tracking and completion callbacks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AsyncToolManager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Submit a function for background execution. Returns a task ID.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;task_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_done_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;task_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a background task finishes, the manager fires an &lt;code&gt;on_complete&lt;/code&gt; callback with an &lt;code&gt;AsyncTaskResult&lt;/code&gt; containing the task ID, tool name, arguments, result (or error), and elapsed time. The manager doesn't know or care what happens next — it just reports completion.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The Agent Wrapper: &lt;code&gt;AsyncAgent&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the orchestration layer. &lt;code&gt;AsyncAgent&lt;/code&gt; wraps a standard &lt;code&gt;Agent&lt;/code&gt; and manages the lifecycle of delivering async results back to the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;research_topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analyze_sentiment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fetch_weather&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;async_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;manager&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;async_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research quantum computing and check the weather in Paris&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I wrap the agent like this rather than subclass it as the agent is a complex thing, and this method should more easliy extend to other frameworks. AsyncAgent doesn't change how the Agent works — it just manages when and how it gets invoked.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AsyncAgent&lt;/code&gt; registers itself as the manager's completion callback and maintains a simple state machine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent is idle, result arrives&lt;/strong&gt;: Invoke the agent immediately with the formatted result. The model processes it and responds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent is busy (already processing something), result arrives&lt;/strong&gt;: Queue the result. When the current invocation finishes, drain the queue — deliver each queued result to the agent one at a time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple results arrive while busy&lt;/strong&gt;: They all queue up and get delivered sequentially in a draining phase after the agent finishes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(I have OpenClaw to thank for this - sorta. As I was digging through it's code, I then fell in to looking at &lt;a href="https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/README.md" rel="noopener noreferrer"&gt;Pi&lt;/a&gt; interrupt semantics that does something simialr.)&lt;/p&gt;

&lt;p&gt;The result is delivered to the model as a plain text message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ASYNC RESULT]
Task ID: abc123
Tool: research_topic(topic='quantum computing')
Result:
Key finding: quantum computing has seen 340% growth in the last 2 years.
Experts predict the quantum computing market will reach $50B by 2028.
Elapsed: 15234ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model treats this like any other user message and responds naturally.&lt;/p&gt;

&lt;p&gt;Thread safety matters here. The completion callback fires from the thread pool, but agent invocations must be serialized (you can't call a Strands Agent from two threads simultaneously). &lt;code&gt;AsyncAgent&lt;/code&gt; uses a lock to protect its busy flag and result queue, ensuring that results are always delivered one at a time in a safe order.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Demo
&lt;/h2&gt;

&lt;p&gt;In the repo &lt;code&gt;demo.py&lt;/code&gt; is my minimal CLI that shows the whole thing working end-to-end with mock tools (with made up data!) that wait for random times. It should look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ uv run python demo.py

You: Research quantum computing and check the weather in London

  [thinking] processing...
I've started both tasks for you:
1. Researching quantum computing (Task abc123)
2. Fetching weather for London (Task def456)

Results will come in as they complete.

  [callback] fetch_weather (def456) completed in 11234ms — delivering to agent now
  [thinking] processing...
The weather in London just came in: partly cloudy, 18C, humidity at 65%.

Still waiting on the research results.

  [callback] research_topic (abc123) completed in 17891ms — delivering to agent now
  [thinking] processing...
The research on quantum computing is done. Here are the key findings:
- 340% growth in the last 2 years
- Major players include Acme Corp, Nexus Labs, and Orion Systems
- Market predicted to reach $50B by 2028

You:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's three async tools (with simulated 10-20 second delays) and two synchronous tools (calculator and current_time) running side by side. The sync tools return instantly as usual. The async tools dispatch to background threads and deliver results via callbacks. The model stays conversational throughout.&lt;/p&gt;

&lt;p&gt;Try this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: Research Paris

  [thinking] processing...
I've started researching Paris for you (Task a1b2c3).
I'll let you know as soon as the results come in.

You: What time is it there?

  [thinking] processing...
It's currently 15:32 in Paris (CET, UTC+1).

  [callback] research_topic (a1b2c3) completed in 16482ms — delivering to agent now
  [thinking] processing...
The Paris research just came back! Here are some highlights:
- ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What the System Prompt Does
&lt;/h2&gt;

&lt;p&gt;In the demo script the system prompt is explicit about the contract. Over time I would like to simplify this or remove it all together, and I suspect that will happen:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;When you call an async tool it returns a task ID immediately.
The actual result will arrive in a future message tagged [ASYNC RESULT].
Rules:
  - Do NOT guess or fabricate async results. Wait for [ASYNC RESULT].
  - Tell the user each task has been started.
  - You CAN dispatch multiple async tools at once — they run in parallel.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system prompt, combined with the async notice appended to each tool's docstring by the decorator, gives the model enough context to behave correctly. It's prompt engineering, not framework magic — and it works because current models are good enough at following these instructions reliably.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use This (and Probably When Not To)
&lt;/h2&gt;

&lt;p&gt;Let's be honest: &lt;strong&gt;this is an experiment.&lt;/strong&gt; It's not for everyone, and it's not for every situation.&lt;/p&gt;

&lt;p&gt;It works well when (in no particular order):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tools have high, variable latency&lt;/strong&gt; — web searches, API calls, document processing, anything where one tool might take 2 seconds and another might take 30.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want the agent to stay conversational&lt;/strong&gt; — the user shouldn't have to stare at a spinner while a slow tool runs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're building voice interfaces&lt;/strong&gt; — This is a huge one! Dead air is death for voice UX. Async tools work especially well for voice agents: the agent can keep talking and answering follow-up questions while sub-tasks run in the background. This demo actually includes experimental (scrappy) code for testing this with &lt;a href="https://github.com/mikegc-aws/async-agentic-tools/tree/main/voice" rel="noopener noreferrer"&gt;Amazon Nova Sonic 2&lt;/a&gt; as a voice chat — worth a look if you want to try async tools in a real voice flow. (See the readme for the details of how to run the voice code.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tasks are independent&lt;/strong&gt; — async shines when tool calls don't have a hard dependancy on each other's results, and the agent can stil combine the data when multiple tools return.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools are themselves agents (or sub-agents)&lt;/strong&gt; — Another big one! Agent-as-tool is one of the main candidates for long-running processes; those tools often take seconds or minutes and benefit from the model staying responsive while they run.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The code is in the &lt;a href="https://github.com/mikegc-aws/async-agentic-tools" rel="noopener noreferrer"&gt;async-agentic-tools&lt;/a&gt; repository. The demo uses &lt;a href="https://github.com/strands-agents/sdk-python" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt; with Claude Sonnet on Amazon Bedrock. You'll need AWS credentials for Bedrock and Python 3.14+.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/mikegc-aws/async-agentic-tools
&lt;span class="nb"&gt;cd &lt;/span&gt;async-agentic-tools
uv run python demo.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The three files that matter are in &lt;code&gt;strands_async_tools/&lt;/code&gt;: &lt;code&gt;manager.py&lt;/code&gt; (87 lines), &lt;code&gt;decorator.py&lt;/code&gt; (68 lines), and &lt;code&gt;agent.py&lt;/code&gt; (166 lines). The whole thing is about 320 lines of Python. Read it, fork it, break it. If this approach is useful to you, consider giving the &lt;a href="https://github.com/mikegc-aws/async-agentic-tools" rel="noopener noreferrer"&gt;repo&lt;/a&gt; a star — it helps others find it and makes everyone smile.&lt;/p&gt;

&lt;p&gt;This is my experiment in what becomes possible when models get smart enough to handle architectural patterns that would have confused them a year ago. The request-response loop served us well. But tools are getting slower and more powerful, conversations are getting longer and more complex, and users shouldn't have to wait in silence while the interesting work happens in the background.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>Streaming Agents on AWS</title>
      <dc:creator>Mike Chambers</dc:creator>
      <pubDate>Tue, 09 Dec 2025 12:19:44 +0000</pubDate>
      <link>https://forem.com/aws/streaming-agents-on-aws-3610</link>
      <guid>https://forem.com/aws/streaming-agents-on-aws-3610</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmaaywkp07xg9kjdzj3is.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmaaywkp07xg9kjdzj3is.png" alt=" " width="800" height="300"&gt;&lt;/a&gt;&lt;br&gt;
You've built an agent that streams responses beautifully in development. Now you're ready to deploy to production, and you're thinking about security, rate limiting, and authentication.&lt;/p&gt;

&lt;p&gt;Just like you'd put your website behind a CDN or gateway, your agent deserves the same protection. Here's how to deploy streaming agents with API Gateway while maintaining that smooth, real-time user experience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is Part 1 of a two-part series.&lt;/strong&gt; This post covers the architecture and key concepts. For the complete implementation with CDK code and deployment steps, see &lt;a href="https://dev.to/aws/complete-tutorial-streaming-agents-on-aws-185g"&gt;Part 2: Complete Tutorial&lt;/a&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;Production agents need robust protection: rate limiting to prevent abuse, WAF to block attacks, authentication to validate users, and API keys for access control. API Gateway provides all of this.&lt;/p&gt;

&lt;p&gt;The traditional approach has been to choose between two options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Expose Runtime directly&lt;/strong&gt; — This works but requires building security logic into your agent code, and you're vulnerable if a bad actor connects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skip streaming entirely&lt;/strong&gt; — This is fine for background tasks, but for interactive chatbots where users are waiting, a 30+ second delay creates a poor experience.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The good news? &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/api-gateway-response-streaming-rest-apis/" rel="noopener noreferrer"&gt;API Gateway now supports response streaming&lt;/a&gt;, so you can have both enterprise-grade protection and real-time streaming. This post shows you how to set it up. &lt;/p&gt;


&lt;h2&gt;
  
  
  The Right Architecture
&lt;/h2&gt;

&lt;p&gt;Here's what you need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cognito User Pool&lt;/strong&gt; for OAuth2/JWT authentication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AgentCore Runtime&lt;/strong&gt; with JWT authorizer (using Cognito)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway&lt;/strong&gt; with streaming enabled (pointing to Runtime, using Cognito for auth)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ResponseTransferMode: STREAM&lt;/strong&gt; (the key configuration that enables streaming)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The flow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User → Cognito (get ID token) 
     → API Gateway (validate token, stream enabled)
     → Runtime /invocations endpoint (validate token again, stream response)
     → Agent (async generator)
     → Stream back through the chain
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Four Critical Pieces
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Use ID Tokens, Not Access Tokens
&lt;/h3&gt;

&lt;p&gt;API Gateway Cognito authorizers expect &lt;strong&gt;ID tokens&lt;/strong&gt; (which contain user identity claims like &lt;code&gt;sub&lt;/code&gt;), not access tokens. Your client needs to send:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Authorization: Bearer &amp;lt;id_token&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both API Gateway and AgentCore Runtime will validate this token. Defense in depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Use the /invocations Endpoint
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;/invocations&lt;/code&gt; endpoint is the OAuth2 endpoint on AgentCore Runtime. It's specifically designed to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accept JWT ID tokens in the Authorization header&lt;/li&gt;
&lt;li&gt;Validate tokens using the authorizer you configured&lt;/li&gt;
&lt;li&gt;Stream responses using the async generator pattern&lt;/li&gt;
&lt;li&gt;Handle long-running operations with extended timeouts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The endpoint looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://bedrock-agentcore.{region}.amazonaws.com/runtimes/{runtime_id}/invocations?qualifier=DEFAULT&amp;amp;accountId={account}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Other endpoints might not support streaming or might require different authentication. This is the one that works.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Enable ResponseTransferMode: STREAM
&lt;/h3&gt;

&lt;p&gt;API Gateway buffers responses by default, so you need to explicitly enable streaming mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The escape hatch (CDK doesn't expose this directly yet)
&lt;/span&gt;&lt;span class="n"&gt;cfn_method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;post_method&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;default_child&lt;/span&gt;
&lt;span class="n"&gt;cfn_method&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_property_override&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Integration.ResponseTransferMode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STREAM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this configuration, API Gateway will buffer the entire response before sending it to the client, which prevents streaming from working.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Return an Async Generator
&lt;/h3&gt;

&lt;p&gt;Your agent code needs to return an async generator, not a complete response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_stream&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;generate_stream&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runtime detects the async generator and handles the streaming protocol automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Architecture Works
&lt;/h2&gt;

&lt;p&gt;This approach delivers enterprise-grade protection without sacrificing user experience. You get protection from the WAF, and authentication to validate users, all while maintaining the real-time streaming that keeps users engaged. It's the best of both worlds.&lt;/p&gt;

&lt;p&gt;The timeout difference is particularly significant. With streaming, you get up to 15 minutes of execution time, compared to just 29 seconds without it. For agents that make multiple tool calls or process large datasets, streaming enables use cases that simply wouldn't work otherwise.&lt;/p&gt;

&lt;p&gt;Beyond streaming, you get the complete API Gateway feature set: all authorizer types (Cognito, Lambda, IAM), request throttling, access logging, TLS/mTLS support, custom domain names, and centralized metrics and observability. Everything you need for production is included.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Mistakes
&lt;/h2&gt;

&lt;p&gt;While building this architecture, I ran into a few gotchas that cost me some debugging time. Here are the patterns that work and the ones to avoid.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Wrong: Forgetting ResponseTransferMode
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;integration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;apigw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HttpIntegration&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;span class="n"&gt;post_method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat_resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;integration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Missing the critical line!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ✅ Right: Always Set It
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;cfn_method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;post_method&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;default_child&lt;/span&gt;
&lt;span class="n"&gt;cfn_method&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_property_override&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Integration.ResponseTransferMode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STREAM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ❌ Wrong: Using the Wrong Endpoint
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This won't stream properly
&lt;/span&gt;&lt;span class="n"&gt;wrong_endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://bedrock-agentcore.&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.amazonaws.com/runtimes/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/some-path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ✅ Right: Use /invocations
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The OAuth2 endpoint that supports streaming
&lt;/span&gt;&lt;span class="n"&gt;correct_endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://bedrock-agentcore.&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.amazonaws.com/runtimes/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/invocations?qualifier=DEFAULT&amp;amp;accountId=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ❌ Wrong: Returning Complete Response
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;  &lt;span class="c1"&gt;# Won't stream
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ✅ Right: Return Async Generator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_stream&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;generate_stream&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Complete Flow
&lt;/h2&gt;

&lt;p&gt;Let's walk through what happens when a user interacts with your streaming agent. Understanding this flow helps clarify why each piece of the architecture matters.&lt;/p&gt;

&lt;p&gt;It starts with authentication. The user authenticates with Cognito using the OAuth2 flow and receives a JWT ID token. This token contains their identity claims and will be validated at multiple points in the request chain.&lt;/p&gt;

&lt;p&gt;When the frontend sends a request to API Gateway, it includes this token in the &lt;code&gt;Authorization: Bearer &amp;lt;id_token&amp;gt;&lt;/code&gt; header. API Gateway immediately validates the token using its Cognito authorizer—this is the first validation layer. Once validated, API Gateway proxies the request to the AgentCore Runtime OAuth2 endpoint (&lt;code&gt;/invocations&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The Runtime doesn't just trust API Gateway. It validates the token again using its own JWT authorizer, providing a second validation layer for defense in depth. With authentication confirmed, the agent executes and returns an async generator.&lt;/p&gt;

&lt;p&gt;Now the streaming begins. The Runtime streams chunks back to API Gateway with no buffering. API Gateway, configured with &lt;code&gt;ResponseTransferMode: STREAM&lt;/code&gt;, passes those chunks directly to the client without buffering. The client displays chunks as they arrive, creating that smooth, real-time experience users expect.&lt;/p&gt;

&lt;p&gt;The beauty of this architecture is that streaming happens at every layer: Agent → Runtime → API Gateway → Client. And authentication is validated at both API Gateway and Runtime, giving you robust security without compromising performance.&lt;/p&gt;




&lt;h2&gt;
  
  
  Constraints
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Idle Timeouts
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Regional/Private endpoints&lt;/strong&gt;: 5-minute idle timeout (no data for 5 minutes = connection closes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge-optimized endpoints&lt;/strong&gt;: 30-second idle timeout&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep your agent generating data regularly, or the connection will drop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bandwidth Limits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;First 10MB&lt;/strong&gt;: No restrictions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After 10MB&lt;/strong&gt;: Limited to 2MB/s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most agent responses, this won't be an issue.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Doesn't Work with Streaming
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Response transformation with VTL (Velocity Template Language)&lt;/li&gt;
&lt;li&gt;Integration response caching&lt;/li&gt;
&lt;li&gt;Content encoding (gzip, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you need these features, you'll need to handle them differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/" rel="noopener noreferrer"&gt;API Gateway Response Streaming Blog Post&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/api-gateway-response-streaming-rest-apis/" rel="noopener noreferrer"&gt;What's New Announcement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/apigateway/" rel="noopener noreferrer"&gt;API Gateway Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/agentcore-runtime.html" rel="noopener noreferrer"&gt;AgentCore Runtime Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="//../api-gw-sr-runtime/"&gt;Full Code Repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>serverless</category>
      <category>architecture</category>
      <category>ai</category>
      <category>aws</category>
    </item>
    <item>
      <title>Complete Tutorial: Streaming Agents on AWS</title>
      <dc:creator>Mike Chambers</dc:creator>
      <pubDate>Tue, 09 Dec 2025 12:19:40 +0000</pubDate>
      <link>https://forem.com/aws/complete-tutorial-streaming-agents-on-aws-185g</link>
      <guid>https://forem.com/aws/complete-tutorial-streaming-agents-on-aws-185g</guid>
      <description>&lt;p&gt;&lt;strong&gt;This is Part 2 of a two-part series.&lt;/strong&gt; If you haven't read the architecture overview yet, start with &lt;a href="https://dev.to/aws/streaming-agents-on-aws-3610"&gt;Part 1: Streaming Agents with API Gateway&lt;/a&gt; to understand the concepts before diving into the implementation.&lt;/p&gt;

&lt;p&gt;This guide walks you through the complete setup: CDK stacks, agent code, authentication flow, and deployment. By the end, you'll have a production-ready streaming agent protected by API Gateway.&lt;/p&gt;

&lt;p&gt;Complete code: &lt;a href="https://github.com/mikegc-aws/agentic-examples/tree/master/api-gw-sr-runtime" rel="noopener noreferrer"&gt;on GitHub&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AWS Account with appropriate permissions&lt;/li&gt;
&lt;li&gt;AWS CDK installed (&lt;code&gt;npm install -g aws-cdk&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Python 3.11+ with uv (&lt;code&gt;pip install uv&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Basic understanding of CDK, API Gateway, and Cognito&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;We'll deploy three CDK stacks in order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cognito Stack&lt;/strong&gt;: User Pool for OAuth2/JWT authentication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime Stack&lt;/strong&gt;: AgentCore Runtime with JWT authorizer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway Stack&lt;/strong&gt;: REST API with streaming enabled&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The deployment order matters because each stack depends on outputs from the previous one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;api-gw-sr-runtime/
├── app.py                    # CDK app entry point
├── chatbot_spa_cdk/
│   ├── chatbot_spa_stack.py  # Cognito + API Gateway
│   └── agent_runtime_stack.py # AgentCore Runtime
├── agent/
│   └── agent.py              # Streaming agent code
├── spa/                      # Frontend application
└── pyproject.toml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 1: CDK App Setup
&lt;/h2&gt;

&lt;p&gt;The main CDK app orchestrates the three stacks with proper dependencies. The actual implementation includes environment configuration and resource naming:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app.py (simplified - see repo for full version)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_cdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;App&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Environment&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;chatbot_spa_cdk.chatbot_spa_stack&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatbotSpaStack&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;chatbot_spa_cdk.agent_runtime_stack&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentRuntimeStack&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Step 1: Deploy Cognito first
&lt;/span&gt;&lt;span class="n"&gt;cognito_stack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatbotSpaStack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ChatbotCognitoStack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resource_prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot-spa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backend_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Skip API Gateway for now
&lt;/span&gt;    &lt;span class="n"&gt;callback_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:3000/callback.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Deploy Runtime with Cognito references
&lt;/span&gt;&lt;span class="n"&gt;runtime_stack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentRuntimeStack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ChatbotAgentRuntimeStack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resource_prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot-spa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cognito_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_pool_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cognito_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_pool_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;runtime_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_dependency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cognito_stack&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 3: Deploy API Gateway pointing to Runtime
&lt;/span&gt;&lt;span class="n"&gt;api_stack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatbotSpaStack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ChatbotApiGatewayStack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resource_prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot-spa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backend_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;runtime_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runtime_endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;existing_user_pool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cognito_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;existing_user_pool_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cognito_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_pool_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;api_stack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_dependency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;runtime_stack&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;synth&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key points&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cognito deploys first (no dependencies)&lt;/li&gt;
&lt;li&gt;Runtime depends on Cognito (needs User Pool for JWT validation)&lt;/li&gt;
&lt;li&gt;API Gateway depends on Runtime (needs endpoint URL)&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;resource_prefix&lt;/code&gt; parameter makes resources easily identifiable in the console&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 2: Cognito Stack
&lt;/h2&gt;

&lt;p&gt;The Cognito configuration is part of the &lt;code&gt;ChatbotSpaStack&lt;/code&gt;. When deployed without a &lt;code&gt;backend_url&lt;/code&gt;, it creates just the User Pool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From chatbot_spa_cdk/chatbot_spa_stack.py (simplified)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_cdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Duration&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_cdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;aws_cognito&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cognito&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ChatbotSpaStack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Stack&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;construct_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resource_prefix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callback_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;construct_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Create User Pool
&lt;/span&gt;        &lt;span class="n"&gt;user_pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cognito&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;UserPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UserPool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_pool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;resource_prefix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-user-pool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;self_sign_up_enabled&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;sign_in_aliases&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cognito&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SignInAliases&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;auto_verify&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cognito&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AutoVerifiedAttrs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;password_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cognito&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PasswordPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;min_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;require_uppercase&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;require_lowercase&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;require_digits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;require_symbols&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Enable Managed Login UI (Essentials tier)
&lt;/span&gt;        &lt;span class="n"&gt;cfn_user_pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;default_child&lt;/span&gt;
        &lt;span class="n"&gt;cfn_user_pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_property_override&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UserPoolTier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ESSENTIALS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Add domain for hosted UI
&lt;/span&gt;        &lt;span class="n"&gt;user_pool_domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_domain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UserPoolDomain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;cognito_domain&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cognito&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CognitoDomainOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;domain_prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;resource_prefix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Create OAuth2 client
&lt;/span&gt;        &lt;span class="n"&gt;user_pool_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UserPoolClient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_pool_client_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;resource_prefix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-client&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;generate_secret&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Public client for web apps
&lt;/span&gt;            &lt;span class="n"&gt;o_auth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cognito&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OAuthSettings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;flows&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cognito&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OAuthFlows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;authorization_code_grant&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;scopes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="n"&gt;cognito&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OAuthScope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OPENID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;cognito&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OAuthScope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EMAIL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;cognito&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OAuthScope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PROFILE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;callback_urls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;callback_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:3000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;logout_urls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:3000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;refresh_token_validity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;days&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;access_token_validity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;id_token_validity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Export for other stacks
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_pool&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_pool_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_pool_client&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Configuration details&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;self_sign_up_enabled=False&lt;/strong&gt;: Prevents public registration (you control who gets access)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;sign_in_aliases&lt;/strong&gt;: Users sign in with email addresses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;generate_secret=False&lt;/strong&gt;: Public client (web apps can't keep secrets)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;authorization_code_grant&lt;/strong&gt;: Standard OAuth2 flow for web applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OPENID scope&lt;/strong&gt;: Required for ID tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;callback_urls&lt;/strong&gt;: Where Cognito redirects after authentication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What you get&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User Pool that issues JWT ID tokens&lt;/li&gt;
&lt;li&gt;Hosted UI for authentication (optional, you can build your own)&lt;/li&gt;
&lt;li&gt;OAuth2 client configured for web applications&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 3: AgentCore Runtime Stack
&lt;/h2&gt;

&lt;p&gt;Deploy your agent to AgentCore Runtime with JWT authorization. The actual implementation uses &lt;code&gt;CfnResource&lt;/code&gt; and includes bundling logic for dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# From chatbot_spa_cdk/agent_runtime_stack.py (simplified)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_cdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CfnResource&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_cdk.aws_s3_assets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Asset&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentRuntimeStack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Stack&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;construct_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_pool_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                 &lt;span class="n"&gt;resource_prefix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;construct_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Package agent code with dependencies
&lt;/span&gt;        &lt;span class="c1"&gt;# (See repo for full bundling configuration)
&lt;/span&gt;        &lt;span class="n"&gt;agent_asset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Asset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AgentCodeAsset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="c1"&gt;# bundling configuration omitted for brevity
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Build Cognito OIDC discovery URL
&lt;/span&gt;        &lt;span class="n"&gt;discovery_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://cognito-idp.&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.amazonaws.com/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_pool_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/.well-known/openid-configuration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Create runtime using CfnResource (Layer 1 construct)
&lt;/span&gt;        &lt;span class="n"&gt;runtime_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resource_prefix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_agent_runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CfnResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AgentCoreRuntime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS::BedrockAgentCore::Runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AgentRuntimeName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;runtime_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Runtime for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;resource_prefix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; with streaming&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RoleArn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;runtime_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role_arn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# IAM role created separately
&lt;/span&gt;                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NetworkConfiguration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NetworkMode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PUBLIC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AuthorizerConfiguration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CustomJWTAuthorizer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DiscoveryUrl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;discovery_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AllowedAudience&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user_pool_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_pool_client_id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AgentRuntimeArtifact&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CodeConfiguration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;S3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;agent_asset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;s3_bucket_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Prefix&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;agent_asset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;s3_object_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="p"&gt;}&lt;/span&gt;
                        &lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EntryPoint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PYTHON_3_12&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Build the OAuth2 endpoint URL
&lt;/span&gt;        &lt;span class="n"&gt;runtime_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ref&lt;/span&gt;
        &lt;span class="n"&gt;runtime_endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://bedrock-agentcore.&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.amazonaws.com/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;runtimes/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;runtime_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/invocations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;?qualifier=DEFAULT&amp;amp;accountId=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runtime_endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;runtime_endpoint&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Critical details&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CustomJWTAuthorizer&lt;/strong&gt;: Uses OIDC discovery to validate ID tokens from Cognito&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DiscoveryUrl&lt;/strong&gt;: Points to Cognito's OIDC configuration endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AllowedAudience&lt;/strong&gt;: The User Pool Client ID (ID tokens must have this in their &lt;code&gt;aud&lt;/code&gt; claim)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;/invocations endpoint&lt;/strong&gt;: The OAuth2 endpoint that supports streaming&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;qualifier=DEFAULT&lt;/strong&gt;: Uses the default runtime version&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;accountId&lt;/strong&gt;: Required for cross-account access control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CfnResource&lt;/strong&gt;: Used because CDK doesn't have L2 constructs for AgentCore yet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What happens&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runtime validates every request's JWT ID token&lt;/li&gt;
&lt;li&gt;Invalid or missing tokens are rejected with 401&lt;/li&gt;
&lt;li&gt;Valid tokens allow the request to proceed to your agent&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 4: API Gateway Stack
&lt;/h2&gt;

&lt;p&gt;Create the REST API with streaming enabled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_cdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Duration&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_cdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;aws_apigateway&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;apigw&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ApiGatewayStack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Stack&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;construct_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;runtime_endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;construct_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Create REST API
&lt;/span&gt;        &lt;span class="n"&gt;api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;apigw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RestApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Api&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;rest_api_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent-api&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;default_cors_preflight_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;apigw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CorsOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;allow_origins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:3000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;allow_methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPTIONS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;allow_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Cognito authorizer
&lt;/span&gt;        &lt;span class="n"&gt;authorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;apigw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CognitoUserPoolsAuthorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CognitoAuthorizer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;cognito_user_pools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user_pool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# HTTP Proxy Integration
&lt;/span&gt;        &lt;span class="n"&gt;integration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;apigw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HttpIntegration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;runtime_endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# The OAuth2 endpoint from Runtime Stack
&lt;/span&gt;            &lt;span class="n"&gt;http_method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;proxy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;apigw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;IntegrationOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;connection_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;apigw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConnectionType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INTERNET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# 15 minutes with streaming
&lt;/span&gt;                &lt;span class="n"&gt;request_parameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integration.request.header.Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;method.request.header.Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Add method
&lt;/span&gt;        &lt;span class="n"&gt;chat_resource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;post_method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat_resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;integration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;authorizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;authorizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;authorization_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;apigw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AuthorizationType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COGNITO&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# CRITICAL: Enable streaming with escape hatch
&lt;/span&gt;        &lt;span class="n"&gt;cfn_method&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;post_method&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;default_child&lt;/span&gt;
        &lt;span class="n"&gt;cfn_method&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_property_override&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Integration.ResponseTransferMode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STREAM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why the escape hatch?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CDK's &lt;code&gt;HttpIntegration&lt;/code&gt; doesn't expose &lt;code&gt;ResponseTransferMode&lt;/code&gt; directly yet. The escape hatch lets you set it on the underlying CloudFormation resource.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What this does&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Gateway validates the JWT ID token (first layer of defense)&lt;/li&gt;
&lt;li&gt;Forwards the Authorization header to Runtime (second layer of defense)&lt;/li&gt;
&lt;li&gt;Streams the response instead of buffering it&lt;/li&gt;
&lt;li&gt;Allows up to 15 minutes for the request to complete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;CORS configuration&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allows requests from your frontend origin&lt;/li&gt;
&lt;li&gt;Includes Authorization header in allowed headers&lt;/li&gt;
&lt;li&gt;Handles preflight OPTIONS requests&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 5: Agent Implementation
&lt;/h2&gt;

&lt;p&gt;This example uses two SDKs to simplify development:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/awslabs/strands-agents" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt;&lt;/strong&gt;: A Python framework for building agentic workflows with streaming support built-in&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/agentcore-runtime.html" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore SDK&lt;/a&gt;&lt;/strong&gt;: Handles the AgentCore Runtime integration and streaming protocol&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both SDKs are optional. You can build agents with any framework that returns async generators, but these make it much easier to get up and running quickly for this demo.&lt;/p&gt;

&lt;p&gt;Your agent code needs to return an async generator for streaming:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# agent/agent.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore.runtime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockAgentCoreApp&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockAgentCoreApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Lazy load agent for performance
&lt;/span&gt;&lt;span class="n"&gt;_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_agent&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_agent&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_agent&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant that can perform calculations.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_agent&lt;/span&gt;

&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Entry point that returns an async generator for streaming&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_agent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Return an async generator
&lt;/span&gt;    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_stream&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;agent_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent_stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="c1"&gt;# You can also handle tool use events here if needed
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;generate_stream&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;BedrockAgentCoreApp&lt;/code&gt; detects when you return an async generator&lt;/li&gt;
&lt;li&gt;It handles the streaming protocol automatically&lt;/li&gt;
&lt;li&gt;Each &lt;code&gt;yield&lt;/code&gt; sends a chunk to the client immediately&lt;/li&gt;
&lt;li&gt;The stream flows: Agent → Runtime → API Gateway → Client&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why lazy load the agent?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The runtime reuses the same container across invocations, which means the agent instance stays in memory. This is crucial for maintaining conversation context and history between requests. By lazy loading, you initialize the agent once and it persists across all subsequent invocations, allowing multi-turn conversations to work naturally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent requirements&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="c"&gt;# agent/pyproject.toml&lt;/span&gt;
&lt;span class="nn"&gt;[project]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"streaming-agent"&lt;/span&gt;
&lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1.0"&lt;/span&gt;
&lt;span class="py"&gt;dependencies&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s"&gt;"bedrock-agentcore-runtime"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"strands"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"strands-tools"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 6: Frontend Implementation
&lt;/h2&gt;

&lt;p&gt;Handle streaming on the client side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Get ID token from Cognito (after OAuth2 flow)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;idToken&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;sessionStorage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getItem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id_token&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;sendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://your-api.execute-api.us-west-2.amazonaws.com/chat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;idToken&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Read the stream&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;decoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TextDecoder&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="c1"&gt;// Decode chunk (may contain partial UTF-8 sequences)&lt;/span&gt;
        &lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;decoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

        &lt;span class="c1"&gt;// Display immediately&lt;/span&gt;
        &lt;span class="nf"&gt;appendToMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Flush any remaining buffer&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;appendToMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: Use &lt;code&gt;{ stream: true }&lt;/code&gt; in &lt;code&gt;TextDecoder.decode()&lt;/code&gt;. This handles partial UTF-8 sequences that can occur at chunk boundaries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OAuth2 flow&lt;/strong&gt; (simplified):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Redirect to Cognito for authentication&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;login&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cognitoDomain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://agent-123456789.auth.us-west-2.amazoncognito.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;clientId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-client-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;redirectUri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:3000/callback.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; 
        &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;cognitoDomain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/oauth2/authorize?`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="s2"&gt;`client_id=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;clientId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="s2"&gt;`response_type=code&amp;amp;`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="s2"&gt;`scope=openid+email&amp;amp;`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="s2"&gt;`redirect_uri=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;encodeURIComponent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;redirectUri&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Handle callback (in callback.html)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleCallback&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URLSearchParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;search&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;code&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Exchange code for tokens&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;cognitoDomain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/oauth2/token`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/x-www-form-urlencoded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URLSearchParams&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="na"&gt;grant_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;authorization_code&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="na"&gt;client_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;clientId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="na"&gt;redirect_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;redirectUri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;sessionStorage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setItem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id_token&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id_token&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;sessionStorage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setItem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;access_token&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;access_token&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Redirect back to app&lt;/span&gt;
    &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Deployment
&lt;/h2&gt;

&lt;p&gt;Deploy the stacks in order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;api-gw-sr-runtime

&lt;span class="c"&gt;# Bootstrap CDK (first time only)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-profile
uv run cdk bootstrap

&lt;span class="c"&gt;# Deploy all stacks&lt;/span&gt;
uv run cdk deploy &lt;span class="nt"&gt;--all&lt;/span&gt;

&lt;span class="c"&gt;# Update frontend config with API URL&lt;/span&gt;
./update-spa-config.sh

&lt;span class="c"&gt;# Create a test user&lt;/span&gt;
./create-test-user.sh testuser@example.com TestPassword123!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What happens&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cognito stack deploys (User Pool + Client)&lt;/li&gt;
&lt;li&gt;Runtime stack deploys (references Cognito)&lt;/li&gt;
&lt;li&gt;API Gateway stack deploys (references Runtime endpoint)&lt;/li&gt;
&lt;li&gt;Scripts configure frontend and create test user&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Testing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Test with curl
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get your ID token (from browser sessionStorage or Cognito)&lt;/span&gt;
&lt;span class="nv"&gt;ID_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"eyJraWQiOi..."&lt;/span&gt;

&lt;span class="c"&gt;# Test with -N flag for no buffering&lt;/span&gt;
curl &lt;span class="nt"&gt;-N&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
  https://your-api.execute-api.us-west-2.amazonaws.com/chat &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$ID_TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt":"What is 25 * 4? Show your work."}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see the response appear incrementally, not all at once.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test with frontend
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;spa
python &lt;span class="nt"&gt;-m&lt;/span&gt; http.server 3000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt;, log in, and send a message. You should see the response stream in real-time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Streaming not working (response appears all at once)
&lt;/h3&gt;

&lt;p&gt;Check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is &lt;code&gt;ResponseTransferMode: STREAM&lt;/code&gt; set on the API Gateway method?&lt;/li&gt;
&lt;li&gt;Are you using the &lt;code&gt;/invocations&lt;/code&gt; endpoint?&lt;/li&gt;
&lt;li&gt;Is your agent returning an async generator?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  401 Unauthorized
&lt;/h3&gt;

&lt;p&gt;Check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the ID token valid? (Check expiration)&lt;/li&gt;
&lt;li&gt;Is the token in the Authorization header?&lt;/li&gt;
&lt;li&gt;Does the JWT authorizer configuration match your Cognito User Pool?&lt;/li&gt;
&lt;li&gt;Are you using the ID token (not access token)?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  502 Bad Gateway
&lt;/h3&gt;

&lt;p&gt;Check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the Runtime endpoint URL correct?&lt;/li&gt;
&lt;li&gt;Does the Runtime have the JWT authorizer configured?&lt;/li&gt;
&lt;li&gt;Is the agent code deployed correctly?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Connection drops after 30 seconds
&lt;/h3&gt;

&lt;p&gt;You're using an edge-optimized endpoint. Switch to regional:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;apigw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RestApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Api&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;endpoint_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;apigw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EndpointType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;REGIONAL&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# Add this
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Agent not streaming
&lt;/h3&gt;

&lt;p&gt;Check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is your agent returning an async generator?&lt;/li&gt;
&lt;li&gt;Are you yielding chunks, not returning a complete response?&lt;/li&gt;
&lt;li&gt;Is the agent actually generating data? (Add logging)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Performance Optimization
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Lazy Load Your Agent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_agent&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_agent&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_agent&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;  &lt;span class="c1"&gt;# Only initialize once
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_agent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runtime reuses the same container across invocations, lazy loading keeping the agent instance in memory. This allows the agent to maintain conversation history and context between requests, enabling natural multi-turn conversations without needing external storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full Code Repository
&lt;/h2&gt;

&lt;p&gt;Complete code: &lt;a href="https://github.com/mikegc-aws/agentic-examples/tree/master/api-gw-sr-runtime" rel="noopener noreferrer"&gt;on GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All CDK stacks&lt;/li&gt;
&lt;li&gt;Agent implementation&lt;/li&gt;
&lt;li&gt;Frontend with OAuth2&lt;/li&gt;
&lt;li&gt;Deployment scripts&lt;/li&gt;
&lt;li&gt;Test utilities&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/" rel="noopener noreferrer"&gt;API Gateway Response Streaming Blog Post&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/api-gateway-response-streaming-rest-apis/" rel="noopener noreferrer"&gt;What's New Announcement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/apigateway/" rel="noopener noreferrer"&gt;API Gateway Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/agentcore-runtime.html" rel="noopener noreferrer"&gt;AgentCore Runtime Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cognito/" rel="noopener noreferrer"&gt;Cognito Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cdk/api/v2/" rel="noopener noreferrer"&gt;CDK API Reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>serverless</category>
      <category>ai</category>
      <category>aws</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
