<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Morgan Willis</title>
    <description>The latest articles on Forem by Morgan Willis (@morganwilliscloud).</description>
    <link>https://forem.com/morganwilliscloud</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3618325%2F470cf6d0-e54c-4ddf-8d83-e3db9f829f2b.jpg</url>
      <title>Forem: Morgan Willis</title>
      <link>https://forem.com/morganwilliscloud</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/morganwilliscloud"/>
    <language>en</language>
    <item>
      <title>Amazon Bedrock for Beginners From First Prompt to AI Agent (Full Tutorial)</title>
      <dc:creator>Morgan Willis</dc:creator>
      <pubDate>Tue, 14 Apr 2026 21:14:17 +0000</pubDate>
      <link>https://forem.com/aws/amazon-bedrock-for-beginners-from-first-prompt-to-ai-agent-full-tutorial-12ln</link>
      <guid>https://forem.com/aws/amazon-bedrock-for-beginners-from-first-prompt-to-ai-agent-full-tutorial-12ln</guid>
      <description>&lt;p&gt;So you want to add AI to your application. Maybe you want to build a smart assistant, add a feature that analyzes user input, or you have an AI-powered side project you've been meaning to start.&lt;/p&gt;

&lt;p&gt;On the surface, it sounds simple. Call a model, get a response. But once you actually try to build it, the questions start to stack up fast.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which model do you use?&lt;/li&gt;
&lt;li&gt;How do you call it from your application code?&lt;/li&gt;
&lt;li&gt;What happens when you want the AI to interact with your own data or external systems?&lt;/li&gt;
&lt;li&gt;And how do I control costs?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It can feel like you need to understand everything before you can build anything, but you don't.&lt;/p&gt;

&lt;p&gt;Amazon Bedrock is a great place to start because it's a fully managed service on AWS that gives you API access to AI models from providers like Amazon, Anthropic, Meta, Mistral, and more. You don't need to set up servers, manage infrastructure, and you only pay for what you use. &lt;/p&gt;

&lt;p&gt;On top of model access, Bedrock includes features like Knowledge Bases for connecting your own data, Guardrails for content safety, and tool use for interacting with the real world. &lt;/p&gt;

&lt;p&gt;This post walks through Bedrock's main features with code examples you can run yourself in your own AWS account. Everything comes from the &lt;a href="https://github.com/aws-samples/sample-amazon-bedrock-for-beginners" rel="noopener noreferrer"&gt;companion repo&lt;/a&gt;, which has full working implementations of each example. By the end, we'll combine everything into an AI agent using the Strands Agents SDK to build out a university FAQ chatbot.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc78lztncbohk4o93zkwz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc78lztncbohk4o93zkwz.png" alt="University Chatbot Architecture" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A heads up before we start: we're going to do things step by step, and this could take a while if you're following along. Give yourself an hour or so if you're a total beginner. We'll work directly with the Bedrock APIs so you understand exactly how the pieces fit together. Then at the end, we'll take an easier approach that handles much of the complexity for you. Learning the fundamentals first will make everything make a lot more sense later.&lt;/p&gt;

&lt;p&gt;If you prefer a video walkthrough, this post has an accompanying video that covers the same material with live demos:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/FAgmR9VV0GQ"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before following along, you'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.12+&lt;/strong&gt; installed on your machine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An AWS account&lt;/strong&gt; with &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html" rel="noopener noreferrer"&gt;credentials configured locally&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IAM User or Role&lt;/strong&gt; Create an IAM user or role in your AWS account to follow along with the AWS Console steps, you cannot complete the tutorial using the root user.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You'll also need to install boto3, which is the Python SDK for interacting with AWS services programmatically. Run the following in the terminal in your IDE:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;boto3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Making API Calls to Amazon Bedrock
&lt;/h2&gt;

&lt;p&gt;When you send a prompt to a model and receive a response, that process is called &lt;strong&gt;inference&lt;/strong&gt;. You provide input, the model runs its computation, and it generates output.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6pjqo96zk3wlf7t1cy4w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6pjqo96zk3wlf7t1cy4w.png" alt="Inference" width="800" height="193"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For AI powered applications, you need to be able to run inference against models programmatically through an API. Bedrock exposes a set of APIs you can use. Let's start with the &lt;strong&gt;Converse API&lt;/strong&gt;, which is the standard way to call models on Bedrock. &lt;/p&gt;

&lt;p&gt;The Converse API uses the same standard request format regardless of which model you're talking to. That means you can switch from Amazon Nova to Meta Llama to Anthropic Claude Haiku but still use the same API.&lt;/p&gt;

&lt;p&gt;Here's a complete first API call to Amazon Bedrock:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;use_converse_api&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;bedrock_runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.amazon.nova-lite-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Define a system prompt to set model behavior
&lt;/span&gt;    &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful technical assistant who explains concepts clearly and concisely.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# User message
&lt;/span&gt;    &lt;span class="n"&gt;user_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is serverless computing?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Use the Converse API
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock_runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;converse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;inferenceConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxTokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Extract the response
&lt;/span&gt;    &lt;span class="n"&gt;output_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Display token usage
&lt;/span&gt;    &lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;usage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Input tokens: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;inputTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Output tokens: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;outputTokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;use_converse_api&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's break down the structure of this API call, because you'll see the same pattern throughout the rest of the examples:&lt;/p&gt;

&lt;p&gt;At the top, we import boto3 and create a &lt;code&gt;bedrock-runtime&lt;/code&gt; client. This client is how your Python code communicates with the Bedrock service over the network.&lt;/p&gt;

&lt;p&gt;Then we define the &lt;code&gt;model_id&lt;/code&gt;. We're using Amazon Nova Lite, a fast and cost-efficient model. Every model in Bedrock has a unique ID. You can find the full list in the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html" rel="noopener noreferrer"&gt;supported model IDs documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The call to &lt;code&gt;converse()&lt;/code&gt; has three main parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;system&lt;/code&gt;&lt;/strong&gt;: The system prompt defines the model's role and behavior. Think of it as instructions for how the model should respond. The system prompt is sent with every inference request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;messages&lt;/code&gt;&lt;/strong&gt;: The conversation between the user and the model. Each message has a &lt;code&gt;role&lt;/code&gt; (either &lt;code&gt;"user"&lt;/code&gt; or &lt;code&gt;"assistant"&lt;/code&gt;) and &lt;code&gt;content&lt;/code&gt;. This structure lets the model understand who said what. In a real application, the user message would come from a frontend, a mobile app, or command line input. We're hardcoding it here to keep things simple.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;inferenceConfig&lt;/code&gt;&lt;/strong&gt;: Parameters that control how the model generates its response. &lt;code&gt;temperature&lt;/code&gt; controls how random or creative the output is. Set it to 0.0 and you get the most predictable response every time, which is useful for tasks like classification or data extraction. Push it higher toward 1.0 and the output gets more varied, which works better for creative writing or brainstorming. &lt;code&gt;maxTokens&lt;/code&gt; caps how long the response can be. Different models support different inference parameters, so check the documentation for the specific model you're using.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Converse API is the recommended approach because it works the same across all models. Change the &lt;code&gt;modelId&lt;/code&gt; from Nova to Llama to Mistral, and your code still works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Tokens
&lt;/h2&gt;

&lt;p&gt;Notice in the code above we printed token usage at the end of the script. Before we go further, you need to understand what tokens are, because they directly affect how much you pay.&lt;/p&gt;

&lt;p&gt;A token is a small chunk of text. It might be a whole word, part of a word, or even punctuation. Different models break text into tokens in slightly different ways, and there is no universal standard. When you send a prompt to a model, your text gets broken into tokens. The model processes those tokens and generates new tokens as its response.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjeqs882nxscswmmvsua.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjeqs882nxscswmmvsua.png" alt="Tokens" width="800" height="291"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A short sentence like "What is serverless computing?" gets broken into several tokens. Longer prompts mean more input tokens. Longer responses mean more output tokens. You're billed for both, so the size of your prompt and the length of the model's response directly affect cost. Always set &lt;code&gt;maxTokens&lt;/code&gt; to prevent runaway responses from driving up your bill.&lt;/p&gt;

&lt;p&gt;Every model also has a &lt;strong&gt;context window&lt;/strong&gt;, which is the maximum number of tokens it can handle in a single request. This is the model's working memory. Your input tokens and output tokens all need to fit inside the context window. If you exceed the window, then the API returns an error because it cannot process that many tokens in one call. This becomes important for long conversations and applications where you inject large amounts of data into the prompt for the model to reason over.&lt;/p&gt;

&lt;p&gt;You can use the &lt;a href="https://aws.amazon.com/bedrock/pricing/" rel="noopener noreferrer"&gt;Bedrock pricing page&lt;/a&gt; to understand token costs for different models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Turn Conversations
&lt;/h2&gt;

&lt;p&gt;Up to this point, we've done single-turn interactions: one prompt, one response. But real applications usually need ongoing conversations where the model remembers what was said earlier.&lt;/p&gt;

&lt;p&gt;Here's the thing though: models are &lt;strong&gt;stateless&lt;/strong&gt; by design. Each API call is completely independent and the model doesn't remember anything from previous requests. You need to explicitly send the full conversation history with every call. &lt;/p&gt;

&lt;p&gt;This is how all AI powered chat applications work. It seems like they remember everything you talked about between prompts, but that is only because the conversation history is collected and submitted into context through the prompt with every request.&lt;/p&gt;

&lt;p&gt;That means when you are writing apps that need multi-turn conversations, your code is responsible for managing and sending the full context. Let's build a cooking assistant that demonstrates three conversation turns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;multi_turn_conversation&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;bedrock_runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.amazon.nova-lite-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# System prompt sets the assistant's behavior
&lt;/span&gt;    &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful cooking assistant. Provide concise recipe suggestions.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Conversation history - we'll build this up with each turn
&lt;/span&gt;    &lt;span class="n"&gt;conversation_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Turn 1: Ask for recipe suggestions
&lt;/span&gt;    &lt;span class="n"&gt;user_message_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Suggest a quick dinner recipe with chicken.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message_1&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;response_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock_runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;converse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;inferenceConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxTokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;assistant_message_1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response_1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Add assistant's response to history
&lt;/span&gt;    &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;assistant_message_1&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c1"&gt;# Turn 2: Ask for modifications
&lt;/span&gt;    &lt;span class="n"&gt;user_message_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Can you make it vegetarian instead?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message_2&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;response_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock_runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;converse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;inferenceConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxTokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;assistant_message_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response_2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;assistant_message_2&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c1"&gt;# Turn 3: Ask for cooking time
&lt;/span&gt;    &lt;span class="n"&gt;user_message_3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How long will this take to prepare?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message_3&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;response_3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock_runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;converse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;inferenceConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxTokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;assistant_message_3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response_3&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;assistant_message_3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;multi_turn_conversation&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern is the same for every turn: append the user message to the conversation history, call the Converse API with the full history, then append the assistant's response back to the history.&lt;/p&gt;

&lt;p&gt;The model can reference what was said in turn 1 when responding to turn 2, but only because you're resending everything. You're paying for those tokens each time too, which is why conversation history management matters for cost.&lt;/p&gt;

&lt;p&gt;In production, you'd store conversation history somewhere persistent, like a database. When a user returns, you load their history and continue where they left off.&lt;/p&gt;

&lt;p&gt;Showing you how to use the Converse API like this is essentially doing it the hard way, and we're doing this on purpose for learning purposes. In a real application, you also wouldn't have redundant code like this. You'd refactor common code into functions and collect user input dynamically.&lt;/p&gt;

&lt;p&gt;There are higher-level libraries and frameworks that can handle a lot of that complexity for you, including managing the message history and formatting the request body. But we're working with the Bedrock APIs directly for now so you understand exactly how Bedrock and AI models actually work. Later, when I show you the simpler way using the Strands Agents SDK, you'll fully understand what's happening under the hood.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Use (Function Calling)
&lt;/h2&gt;

&lt;p&gt;Everything we've done has been purely text in, text out. The model generates a response based on its training data and whatever you include in the prompt. But this is problematic for real world usage.&lt;/p&gt;

&lt;p&gt;You can’t rely on training data alone. Models have a knowledge cutoff based on when they were trained, and they don’t have access to real-time or external data like today’s weather, live content from the internet, or data stored in databases.&lt;/p&gt;

&lt;p&gt;They don't know what's happening right now, and they can't take actions in the real world on their own.&lt;/p&gt;

&lt;p&gt;That's where &lt;strong&gt;tool use&lt;/strong&gt; comes in. Tools are functions that a model can request your application to run in order to interact with external systems. The model doesn't execute tools itself. It sends a structured request saying "I want to call this function with these arguments," and your code handles the actual execution.&lt;/p&gt;

&lt;p&gt;This is how most modern AI applications work. A chatbot that does research for you using the internet? That's tool use. A coding assistant that reads files from your local disk? Tool use. A personal assistant bot that checks your calendar? Also tool use.&lt;/p&gt;

&lt;p&gt;Now, this does get a bit involved when you're doing everything the hard way, but stick with me. This is important to understand when you are building a foundational understanding of how AI works.&lt;/p&gt;

&lt;p&gt;Think of it like this: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;1.&lt;/strong&gt; You create tools in your code.&lt;br&gt;
&lt;strong&gt;2.&lt;/strong&gt; You write code to inform the model what tools exist and how to use them, this is often called a tool schema.&lt;br&gt;
&lt;strong&gt;3.&lt;/strong&gt; You send the model a prompt along with the tool schema.&lt;br&gt;
&lt;strong&gt;4.&lt;/strong&gt; The model reasons over the prompt and decides if it needs a tool to answer.&lt;br&gt;
&lt;strong&gt;5.&lt;/strong&gt; If it does need a tool, the model returns a response to your application code including information on which tool to call and with what arguments.&lt;br&gt;
&lt;strong&gt;6.&lt;/strong&gt; Your code runs the tool.&lt;br&gt;
&lt;strong&gt;7.&lt;/strong&gt; Your code sends the result of the tool call back to the model.&lt;br&gt;
&lt;strong&gt;8.&lt;/strong&gt; The model reasons over the tool result and works that information into its final response.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7njmkzw88o78i3n6hjkx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7njmkzw88o78i3n6hjkx.png" alt="Tool Use" width="800" height="758"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the full tool use example following this flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="c1"&gt;# ---------------------------------------------------------------------------
# Step 1: Define your local Python functions
# ---------------------------------------------------------------------------
# These are regular Python functions. The model will never call them directly.
# Instead, the model will ASK us to call them by returning a tool_use block.
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fahrenheit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Simulate fetching weather data for a location.
    In a real app, this would call a weather API like OpenWeatherMap.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;weather_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;58&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fahrenheit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Partly cloudy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;humidity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;72%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wind&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8 mph NW&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt;


&lt;span class="c1"&gt;# ---------------------------------------------------------------------------
# Step 2: Describe your functions as "tools" for the model
# ---------------------------------------------------------------------------
# The model needs a description of each tool so it knows:
#   - What the tool does (description)
#   - What inputs it expects (inputSchema)
#
# This is like writing documentation so someone else can use your function.
&lt;/span&gt;
&lt;span class="n"&gt;TOOL_CONFIG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;toolSpec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get the current weather for a given location.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputSchema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The city and state, e.g. &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;San Francisco, CA&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="p"&gt;},&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enum&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fahrenheit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;celsius&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Temperature unit (default: fahrenheit)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="c1"&gt;# ---------------------------------------------------------------------------
# Step 3: Map tool names to actual Python functions
# ---------------------------------------------------------------------------
# When the model asks to use a tool, we look up the function by name here.
&lt;/span&gt;
&lt;span class="n"&gt;TOOL_FUNCTIONS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Look up a tool by name and call it with the provided input.
    Returns the result as a dictionary.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TOOL_FUNCTIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unknown tool: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# ** unpacks the dict into keyword arguments:
&lt;/span&gt;    &lt;span class="c1"&gt;#   get_weather(**{"location": "Seattle"})  →  get_weather(location="Seattle")
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# ---------------------------------------------------------------------------
# Step 4: The main tool use loop
# ---------------------------------------------------------------------------
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tool_use_demo&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;bedrock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.amazon.nova-lite-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;user_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather like in Seattle right now?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# First API call: send the message AND the tool definitions to the model
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;converse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;toolConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TOOL_CONFIG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;inferenceConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxTokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stopReason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;assistant_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Check: did the model ask to use a tool?
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Find the toolUse block in the response
&lt;/span&gt;        &lt;span class="n"&gt;tool_use_block&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;assistant_message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;toolUse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;tool_use_block&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;toolUse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_use_block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;tool_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_use_block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;tool_use_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_use_block&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;toolUseId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Run the actual function
&lt;/span&gt;        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Send the result back to the model
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;assistant_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;toolResult&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;toolUseId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_use_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Second API call: model generates its final answer using the tool result
&lt;/span&gt;        &lt;span class="n"&gt;final_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;converse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;toolConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TOOL_CONFIG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;inferenceConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxTokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;final_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;final_response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;tool_use_demo&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's walk through what's happening:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1&lt;/strong&gt; is defining the actual Python function. The tool in this case is a local function that simulates fetching weather data. In the real world, you'd swap this out by connecting it to a real API. The model will never call this function directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2&lt;/strong&gt; is creating a tool schema that describes the function to the model. Think of this like writing documentation so the model knows how to use it. We give the tool a name, a description in natural language, and an input schema that lays out what parameters the tool accepts, their types, and whether they're required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3&lt;/strong&gt; is a dictionary that maps tool names to actual functions. When the model decides it needs a tool, it returns the name of the tool it wants to call. We need to be able to look that up and figure out which function to run. The &lt;code&gt;run_tool&lt;/code&gt; function handles this dispatch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4&lt;/strong&gt; is the main loop. We call the Converse API with the user message and the tool config. The model sees the question, sees the available tools, and decides it needs the weather tool. It returns a &lt;code&gt;tool_use&lt;/code&gt; block with the function name and arguments. Our code runs the actual function, then sends the result back to the model in a &lt;code&gt;toolResult&lt;/code&gt; message. The model uses that real data to generate its final response.&lt;/p&gt;

&lt;p&gt;The tool itself can be anything: a local function, an API call to another service, a database query, or a function running in the cloud. The pattern stays the same.&lt;/p&gt;

&lt;p&gt;For more details, see the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/tool-use.html" rel="noopener noreferrer"&gt;Bedrock tool use documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG and Knowledge Bases
&lt;/h2&gt;

&lt;p&gt;Tools are great, but one of the most common use cases for integrating AI into applications is to have it be able to reason over private data, but models don't have access to this data by default.&lt;/p&gt;

&lt;p&gt;Models don't have access to your company's internal documentation, your product specs, or any of your proprietary data. If you ask a model about a companies internal processes, it's going to hallucinate something that seems plausible but is actually completely made up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retrieval Augmented Generation (RAG)&lt;/strong&gt; is the common fix for this. The concept is simple: before you ask the model to generate an answer, you first search your own documents for relevant information. Then you include that data in the prompt. The model generates its response grounded in your actual data instead of relying only on what it learned during training.&lt;/p&gt;

&lt;p&gt;Retrieve the data, augment the prompt, generate the response. That's where the abbreviation RAG comes from.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdcr582swomxf5t35twu2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdcr582swomxf5t35twu2.png" alt="Retrieval Augmented Generation" width="800" height="228"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How the Retrieval Part Works
&lt;/h3&gt;

&lt;p&gt;The retrieval step uses &lt;strong&gt;semantic search&lt;/strong&gt;, which is different from traditional keyword search. Keyword search looks for exact word matches, while semantic search understands the meaning of the text and searches on that instead.&lt;/p&gt;

&lt;p&gt;If your document says "customers can return items within 30 days," semantic search will find it when someone asks about "refund window" or "return period," even though those exact words don't appear. The words "queen" and "king" aren't a direct match either, but they're semantically similar because they both represent royalty. Semantic search finds that relationship but traditional search would not.&lt;/p&gt;

&lt;p&gt;To make semantic search work, your data needs to be converted into numbers, or vectors, so the computer can compare meaning mathematically. Here's how the pipeline works:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffxp7on57a8jq10hd4uwu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffxp7on57a8jq10hd4uwu.png" alt="RAG Process" width="800" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Upload and Chunking&lt;/strong&gt;: Upload your documents and then break them into smaller passages called chunks. A 50-page PDF would become many chunks. There are different chunking methods depending on your use case and data structure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding&lt;/strong&gt;: Each chunk gets run through an embedding model, which converts the text into a &lt;strong&gt;vector&lt;/strong&gt;, or a list of numbers that represents the meaning of that text. Think of it as a numerical fingerprint of what the text is about.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt;: Those vectors get stored in a &lt;strong&gt;vector database&lt;/strong&gt;, optimized for searching across vectors quickly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt;: When a user asks a question, that question also gets converted into a vector. The vector database queries the data and finds the chunks whose vectors are closest to the question's vector semantically. Those are your most relevant passages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generation&lt;/strong&gt;: The relevant passages get included in the prompt passed to the model, and the model generates an answer grounded in that data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;RAG is powerful, but there's a lot of plumbing involved to make it all work. You have to manage the chunking strategy, run embeddings, pick and maintain a vector database, write retrieval logic, and keep everything in sync when documents change. Luckily, Bedrock does this for you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bedrock Knowledge Bases
&lt;/h3&gt;

&lt;p&gt;Bedrock Knowledge Bases automate the entire RAG pipeline. You point it at your documents in S3 (or other sources like Confluence, SharePoint, or Salesforce), and it handles ingestion, chunking, embedding, and vector storage. Then you query it with a single API call.&lt;/p&gt;

&lt;p&gt;For example, what if a university wanted to make a chatbot to help students find answers to frequently asked questions about course enrollment deadlines, financial aid policies, and general campus information. That is a perfect use case for RAG.&lt;/p&gt;

&lt;p&gt;We'll incrementally build this university chatbot example throughout the rest of this post starting with knowledge base creation. If you want to follow along, use the &lt;a href="https://github.com/aws-samples/sample-amazon-bedrock-for-beginners" rel="noopener noreferrer"&gt;companion repo&lt;/a&gt; which contains the full instructions. This includes sample FAQ documents about enrollment, financial aid, housing, and campus services that we will use as the private data.&lt;/p&gt;

&lt;p&gt;To create a Knowledge Base, you need to upload your documents to Amazon S3, a data storage service, first. You can go to the &lt;a href="//console.aws.amazon.com/s3/"&gt;Amazon S3 Console&lt;/a&gt; and &lt;strong&gt;create a new bucket&lt;/strong&gt;. Then, upload the knowledge base documents to the bucket.&lt;/p&gt;

&lt;p&gt;After that, go to the &lt;a href="https://console.aws.amazon.com/bedrock/" rel="noopener noreferrer"&gt;Bedrock console&lt;/a&gt; to &lt;strong&gt;Create a knowledge base&lt;/strong&gt;. You'll connect the S3 bucket containing the FAQ documents, choose an embedding model (Amazon Titan Embed is a good default), and select a vector store. If you're just getting started, &lt;strong&gt;Amazon S3 Vectors&lt;/strong&gt; is the simplest and most cost-effective option since it doesn't require you to provision a separate vector database. Then sync the data.&lt;/p&gt;

&lt;p&gt;Once your Knowledge Base is created and synced, querying it is a single API call to Bedrock using &lt;code&gt;retrieve_and_generate&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="c1"&gt;# REPLACE THIS with your Knowledge Base ID
&lt;/span&gt;&lt;span class="n"&gt;KNOWLEDGE_BASE_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_KB_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# From the Bedrock console
&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.amazon.nova-lite-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;bedrock_agent_runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrock-agent-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Use retrieve_and_generate to query the Knowledge Base
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock_agent_runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve_and_generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;retrieveAndGenerateConfiguration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;KNOWLEDGE_BASE&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;knowledgeBaseConfiguration&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;knowledgeBaseId&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;KNOWLEDGE_BASE_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;modelArn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MODEL_ID&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Extract the generated response
&lt;/span&gt;    &lt;span class="n"&gt;output_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Display source citations
&lt;/span&gt;    &lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;citations&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Sources:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;citation&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;reference&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;citation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;retrievedReferences&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
                &lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;reference&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
                &lt;span class="n"&gt;s3_location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3Location&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
                &lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3_location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;uri&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;When is spring break this year?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;query_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the client is &lt;code&gt;bedrock-agent-runtime&lt;/code&gt;, not &lt;code&gt;bedrock-runtime&lt;/code&gt;. Knowledge Bases use a different API from the Converse API we've been working with.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;retrieve_and_generate&lt;/code&gt; call does both RAG steps in one step: it retrieves relevant documents using semantic search, then passes them to the model to generate a response. You get both the answer and citations pointing back to the source documents, so your users can verify where the information came from.&lt;/p&gt;

&lt;p&gt;For more on creating and configuring Knowledge Bases, see the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html" rel="noopener noreferrer"&gt;Bedrock Knowledge Bases documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Guardrails (Content Safety)
&lt;/h2&gt;

&lt;p&gt;You now know how to build an AI app that can access your data using RAG and interact with external systems through tools. But before you put something like this in front of real users, you need to think about what happens when people try to misuse it or when the model generates something it shouldn't.&lt;/p&gt;

&lt;p&gt;When you put an AI application on the internet, you have to assume it will be abused. You can't fully trust user input, and you can't blindly trust model output either.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Guardrails&lt;/strong&gt; are content filters that get enforced before and after the model is called. They sit outside the prompt as structural policies. You configure your guardrails once, reference them in your API calls, and they work across any model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gpxdyt204nzn0tlsuee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gpxdyt204nzn0tlsuee.png" alt="Guardrail Types" width="800" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Available filters include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content filters&lt;/strong&gt;: Detect harmful content like hate speech, violence, sexual content, and even prompt attacks like jailbreaks or prompt injection attempts, with adjustable severity thresholds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Denied topics&lt;/strong&gt;: Block entire categories like "investment advice" or "medical diagnosis"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Word filters&lt;/strong&gt;: Block specific words or phrases, including profanity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sensitive information filters&lt;/strong&gt;: Find and mask sensitive data like PII, social security numbers, credit cards, and email addresses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contextual grounding checks&lt;/strong&gt;: check model responses against a reference source to reduce hallucinations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To create a Guardrail, you can use the &lt;a href="https://console.aws.amazon.com/bedrock/" rel="noopener noreferrer"&gt;Bedrock console&lt;/a&gt;. You give the guardrail a name, configure the content filters with severity thresholds that make sense for your use case, and optionally add denied topics or PII filters. Once configured, create a version to get a guardrail ID and version number.&lt;/p&gt;

&lt;p&gt;For the university chatbot, imagine a student tries to ask the assistant something inappropriate, like how to cheat on an exam or how to hack the university network. A guardrail can detect that type of request and block it before the model ever generates a response.&lt;/p&gt;

&lt;p&gt;Here's how you add it to a Knowledge Base query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="c1"&gt;# REPLACE THESE with your actual IDs
&lt;/span&gt;&lt;span class="n"&gt;KNOWLEDGE_BASE_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_KB_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;GUARDRAIL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_GUARDRAIL_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;GUARDRAIL_VERSION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.amazon.nova-lite-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_kb_with_guardrail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;bedrock_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-agent-runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve_and_generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;retrieveAndGenerateConfiguration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;KNOWLEDGE_BASE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;knowledgeBaseConfiguration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;knowledgeBaseId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;KNOWLEDGE_BASE_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;modelArn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generationConfiguration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guardrailConfiguration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guardrailId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GUARDRAIL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guardrailVersion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GUARDRAIL_VERSION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How can I cheat on my finals this year?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;query_kb_with_guardrail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To use a guardrail, all you have to do is add a &lt;code&gt;generationConfiguration&lt;/code&gt; with the guardrail identifier and version number inside the knowledge base configuration. Everything else in your code stays exactly the same.&lt;/p&gt;

&lt;p&gt;A normal question like "When is spring break?" passes through and gets answered normally. But "How can I cheat on my finals?" gets blocked by the guardrail before the model ever generates a response.&lt;/p&gt;

&lt;p&gt;You can also add guardrails directly to Converse API calls using the &lt;code&gt;guardrailConfig&lt;/code&gt; parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock_runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;converse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;guardrailConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;guardrailIdentifier&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;your-guardrail-id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;guardrailVersion&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more details on guardrail configuration options, see the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html" rel="noopener noreferrer"&gt;Bedrock Guardrails documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting It All Together with an Agent
&lt;/h2&gt;

&lt;p&gt;We've been doing everything the hard way on purpose so you could see how the pieces actually work. You might be thinking "that tool use code was a lot of work for one function call" and you'd be right.&lt;/p&gt;

&lt;p&gt;Managing the message history, parsing tool requests, executing functions, and sending results back to the model manually is tedious. And it gets more complicated when the model needs multiple tools or several steps to complete a task. &lt;/p&gt;

&lt;p&gt;Additionally, real-world applications need more than a single prompt and response with hardcoded user queries. You need to take dynamic input from the user and pass it to the model. Then the model might need to look up information, call tools, and take several steps before it can answer a question.&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;agent&lt;/strong&gt; is a system that lets the model do this. Instead of taking in one prompt and responding one time, it can think through the problem, decide what action to take next, use tools if needed, and repeat that process until it reaches a final answer. Under the hood, the model may be called multiple times as part of a loop until the task is complete.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhl7r3vdx64h06u85klq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhl7r3vdx64h06u85klq.png" alt="Agent Loop" width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we are going to switch to a using higher-level framework that handles much of the complexity around building AI applications for you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strands Agents SDK&lt;/strong&gt; is an open-source framework from AWS that makes building AI agents straightforward. It integrates directly with Bedrock, though it supports any model provider, and handles the orchestration we've been doing manually. &lt;/p&gt;

&lt;p&gt;Under the hood, when you use Amazon Bedrock as the model provider, it's calling the same Converse API we've been using throughout this post. That's why it was worth learning the fundamentals first. This should all make sense now rather than feeling like magic.&lt;/p&gt;

&lt;p&gt;To get started with Strands, install the packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;strands-agents strands-agents-tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's the university chatbot as a Strands agent that combines everything we covered: a Bedrock model, a Knowledge Base for university data, a custom tool for course lookups, and a guardrail for content safety:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;retrieve&lt;/span&gt;

&lt;span class="c1"&gt;# ============================================================
# Configuration — Replace these with your resource IDs
# ============================================================
&lt;/span&gt;
&lt;span class="n"&gt;KNOWLEDGE_BASE_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_KB_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;GUARDRAIL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_GUARDRAIL_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;GUARDRAIL_VERSION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.amazon.nova-lite-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;REGION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="c1"&gt;# ============================================================
# Custom Tool: Look Up Course Schedule
# ============================================================
&lt;/span&gt;
&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lookup_course&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;course_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Look up schedule and details for a specific course.

    Use this when a student asks about a particular class,
    like &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;When does CS 201 meet?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Who teaches BIO 101?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

    Args:
        department: The department code (e.g., &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BIO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ENG&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;).
        course_number: The course number (e.g., &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;101&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;201&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;).

    Returns:
        Course details including schedule, instructor, and location.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# In a real app this would query a course catalog API
&lt;/span&gt;    &lt;span class="n"&gt;courses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CS-101&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Introduction to Programming&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;instructor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dr. Maria Chen&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schedule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mon/Wed/Fri 10:00 - 10:50 AM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Turing Engineering Building, Room 210&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;credits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seats_available&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CS-201&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Data Structures&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;instructor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Prof. James Park&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schedule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tue/Thu 1:00 - 2:15 PM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Turing Engineering Building, Room 215&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;credits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seats_available&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BIO-101&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;General Biology I&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;instructor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dr. Sarah Williams&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schedule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mon/Wed 2:00 - 3:15 PM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Science Hall, Room 105&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;credits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seats_available&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ENG-102&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;College Writing II&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;instructor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Prof. David Nguyen&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schedule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tue/Thu 9:30 - 10:45 AM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Humanities Building, Room 302&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;credits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seats_available&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MATH-151&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calculus I&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;instructor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dr. Lisa Patel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schedule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mon/Wed/Fri 11:00 - 11:50 AM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Math &amp;amp; Science Center, Room 120&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;credits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seats_available&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;course_number&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;courses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;courses&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Course: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; — &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Instructor: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;instructor&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Schedule: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;schedule&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Location: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Credits: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;credits&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Seats available: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;seats_available&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No course found for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Check the department code and course number.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="c1"&gt;# ============================================================
# Build the Agent
# ============================================================
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_university_agent&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create the University chatbot agent.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# The built-in retrieve tool reads this env var to find the KB
&lt;/span&gt;    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;KNOWLEDGE_BASE_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;KNOWLEDGE_BASE_ID&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS_REGION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;REGION&lt;/span&gt;

    &lt;span class="n"&gt;bedrock_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;REGION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;guardrail_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;GUARDRAIL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;guardrail_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;GUARDRAIL_VERSION&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are the University virtual assistant.
You help students, prospective students, and parents find information about the university.

Your responsibilities:
- Answer questions about academics, admissions, financial aid, housing, dining, parking, the library, career services, and the academic calendar.
- Use the retrieve tool to search the knowledge base for university policies and FAQ answers before responding.
- Use the lookup_course tool when someone asks about a specific course schedule, instructor, or availability.
- Cite your sources when referencing specific policies or dates.

Guidelines:
- Be friendly and welcoming — remember, students may be stressed about deadlines.
- If you don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t know the answer, say so and suggest they contact the relevant office.
- Keep answers concise and helpful.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lookup_course&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;


&lt;span class="c1"&gt;# ============================================================
# Run the Agent
# ============================================================
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;University Chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ask me about admissions, financial aid, housing, dining,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;course schedules, the academic calendar, and more.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Type &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; to exit.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_university_agent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Goodbye!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Assistant: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's walk through what's different here compared to the manual approach.&lt;/p&gt;

&lt;p&gt;At the top, we import a few things from the Strands framework: &lt;code&gt;Agent&lt;/code&gt;, the &lt;code&gt;@tool&lt;/code&gt; decorator for creating custom tools, &lt;code&gt;BedrockModel&lt;/code&gt; for the model provider, and the built-in &lt;code&gt;retrieve&lt;/code&gt; tool from &lt;code&gt;strands_tools&lt;/code&gt; which queries the Knowledge Base we created earlier.&lt;/p&gt;

&lt;p&gt;Then we define our configuration. We need a Knowledge Base ID for RAG, a Guardrail ID for content filtering, and we're using Amazon Nova as our model. All things we've already set up and used individually.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;@tool&lt;/code&gt; decorator is how you define custom tools in Strands. We've got a &lt;code&gt;lookup_course&lt;/code&gt; tool that simulates looking up course information. In a real application, this would query a database. Compare this to the manual tool use code from earlier and you'll notice there is no lengthy tool schemas, message parsing, or dispatch logic. You just write a function with a docstring and Strands handles the rest.&lt;/p&gt;

&lt;p&gt;Strands also includes a built-in &lt;code&gt;retrieve&lt;/code&gt; tool that works directly with Bedrock Knowledge Bases. You set the Knowledge Base ID as an environment variable, and the agent decides when to use it.&lt;/p&gt;

&lt;p&gt;We created a &lt;code&gt;BedrockModel&lt;/code&gt; instance with the model ID, region, inference parameters, and guardrail information. Then we defined the system prompt telling the agent it's a university chatbot and how it should handle requests. Finally, we created the agent with the model, the tools list (both custom and built-in), and the system prompt.&lt;/p&gt;

&lt;p&gt;The last piece is the interactive loop. We read input from the command line and passed it to the agent. To call the agent, all you need is &lt;code&gt;agent(user_input)&lt;/code&gt;. The framework handles the entire agent loop: when the model needs a tool, Strands executes it and sends the result back to the model. &lt;/p&gt;

&lt;p&gt;Multi-turn conversation management is handled too because each call maintains context from previous turns as long as the program is running.&lt;/p&gt;

&lt;p&gt;Under the hood, Strands is calling the Converse API and using the different Bedrock features we covered throughout. This should all make a lot more sense now than if you jumped right into the agent framework.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://strandsagents.com/" rel="noopener noreferrer"&gt;Strands documentation&lt;/a&gt; has more examples and configuration options.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;We covered a lot of ground. You now have the knowledge you need to start building real applications with AI on AWS using Amazon Bedrock.&lt;/p&gt;

&lt;p&gt;Here are some areas to explore as your application grows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt caching&lt;/strong&gt; can significantly reduce costs on repeated context. If you have a large system prompt or tool definitions that don't change between requests, caching avoids reprocessing those tokens every time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-region inference&lt;/strong&gt; distributes load across AWS regions to balance the inference load. Instead of hitting limits in one region and failing, Bedrock can route your requests globally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudWatch monitoring&lt;/strong&gt; tracks token usage, latency, throttling, and error rates. Setting up monitoring early helps you catch cost spikes and performance issues before they become problems.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href="https://github.com/aws-samples/sample-amazon-bedrock-for-beginners" rel="noopener noreferrer"&gt;companion repo&lt;/a&gt; has the complete code for every example in this post. Clone it, run the examples, and try adapting them to your own use case. Take one of these examples and adapt it to a new use case to push your learning even further and remember to always learn the fundamentals first.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>agents</category>
      <category>bedrock</category>
    </item>
    <item>
      <title>AI Agents Don’t Need Complex Workflows. Build One in Python in 10 Minutes</title>
      <dc:creator>Morgan Willis</dc:creator>
      <pubDate>Thu, 26 Mar 2026 22:16:10 +0000</pubDate>
      <link>https://forem.com/aws/ai-agents-dont-need-complex-workflows-build-one-in-python-in-10-minutes-2m5d</link>
      <guid>https://forem.com/aws/ai-agents-dont-need-complex-workflows-build-one-in-python-in-10-minutes-2m5d</guid>
      <description>&lt;p&gt;Building an AI agent in Python can be as easy as giving a model some tools and letting it figure out the rest.&lt;/p&gt;

&lt;p&gt;Most agent setups start the same way: you wire up tool calls, manage retries, track state, and write the routing logic that decides what happens when. It works, but it's brittle. Every time the workflow changes, you're back in the code rewiring the sequence.&lt;/p&gt;

&lt;p&gt;Strands is an open-source Python SDK built around a different idea.&lt;/p&gt;

&lt;p&gt;Instead of you hardcoding the orchestration, you let the model handle it. You give it tools and a goal, and the SDK takes care of the agent loop, tool execution, and conversation state. You can go from zero to a working agent in about 10 minutes, and the same primitives that make a simple agent easy to build can be combined to give you more complex setups when you need them.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Model Driven Approach to AI Agents
&lt;/h2&gt;

&lt;p&gt;The Strands team calls this a model-driven approach. The LLM is the orchestrator and you define the capabilities it can use.&lt;/p&gt;

&lt;p&gt;In practice, your agent code is mostly plugging in the different desired components. Here's what a basic agent looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the capital of France?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's a working agent. It uses Amazon Bedrock as the model provider by default, but you can swap in any supported provider. We'll use OpenAI for the rest of this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up a Python AI Agent with OpenAI
&lt;/h2&gt;

&lt;p&gt;Install the SDK with the OpenAI extension:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'strands-agents[openai]'&lt;/span&gt; strands-agents-tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_api_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now create an agent that uses OpenAI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.models.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIModel&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the capital of France?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it with &lt;code&gt;python agent.py&lt;/code&gt; and you should get a response. The agent handles the API call to the OpenAI model and response parsing for you. &lt;/p&gt;

&lt;p&gt;So far, this doesn't have any tools it can use to interact with the real world. It does however have a main agent loop handled, and is the starting point from which you will build a more capable agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Building Blocks of an AI Agent
&lt;/h2&gt;

&lt;p&gt;Strands has a couple of main building blocks you should be aware of: agents, tools, models, and hooks. Understanding how they fit together is most of what you need to know.&lt;/p&gt;

&lt;h3&gt;
  
  
  Models
&lt;/h3&gt;

&lt;p&gt;A model is the LLM provider. Strands supports Bedrock (the default), OpenAI, Anthropic, Google Gemini, Meta Llama, Ollama for local models, and several others. You configure the model once and pass it to your agent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can set inference parameters using &lt;code&gt;params&lt;/code&gt;. One good one to note is temperature. Use a lower temperature for factual tasks or a higher temperature for creative ones. Other supported inference parameters depend on the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Giving Your Agent Tools
&lt;/h3&gt;

&lt;p&gt;Tools are Python functions that extend what the agent can do beyond generating text. &lt;/p&gt;

&lt;p&gt;Here's a custom tool to return weather data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get the current weather for a location.

    Args:
        location: City name, e.g. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Seattle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# In a real app, this would call a weather API
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Weather in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: Sunny, 72°F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;@tool&lt;/code&gt; decorator is all you need. The docstring matters because the model uses it along with the function's type hints to decide when to call this function and what arguments to pass. Clear docstrings lead to better tool usage. &lt;/p&gt;

&lt;p&gt;Strands also ships with a community tools package that includes common utilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;python_repl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http_request&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These give your agent the ability to do math, run Python code, and make HTTP requests out of the box.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wiring It All Together
&lt;/h3&gt;

&lt;p&gt;The agent brings everything together. You give it a model, tools, and a system prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant that can check weather and do math.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you call the agent with a message, it enters an agent loop. The model reads the message, decides if it needs to use any tools, calls them if it does, reads the results, and either calls more tools or generates a final response. This loop continues until the model decides it has enough information to answer.&lt;/p&gt;

&lt;p&gt;You don't write any of that loop logic, the SDK handles it for you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hooking into the Agent Lifecycle
&lt;/h3&gt;

&lt;p&gt;Hooks let you subscribe to lifecycle events in the agent loop without modifying the agent's core logic. The agent emits events at specific points during execution: before and after model calls, before and after tool calls, when messages are added, and at the start and end of each invocation. You register callbacks for the events you care about.&lt;/p&gt;

&lt;p&gt;Here's a hook that logs every tool call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.hooks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AfterToolCallEvent&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calling tool: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;With input: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_tool_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AfterToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_hook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_tool_call&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_hook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_tool_result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AfterToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The available events cover the full lifecycle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;BeforeInvocationEvent&lt;/code&gt; / &lt;code&gt;AfterInvocationEvent&lt;/code&gt; for the overall request&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;BeforeModelCallEvent&lt;/code&gt; / &lt;code&gt;AfterModelCallEvent&lt;/code&gt; for LLM calls&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;BeforeToolCallEvent&lt;/code&gt; / &lt;code&gt;AfterToolCallEvent&lt;/code&gt; for tool execution&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MessageAddedEvent&lt;/code&gt; when messages are added to conversation history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hooks are useful for logging, metrics, basic guardrails, and adding logic to your agents lifecycle. You can also cancel a tool call from a &lt;code&gt;BeforeToolCallEvent&lt;/code&gt; hook by setting &lt;code&gt;event.cancel_tool&lt;/code&gt; to a message, which stops the tool from executing and sends that message back to the model as an error. For example, you can check the tool name and arguments, and block it if something looks wrong.&lt;/p&gt;

&lt;p&gt;Once you start building more complex agents, you'll find yourself wanting to bundle related hooks and tools into reusable packages. We'll get to that later in this post.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a Multi-Tool Agent
&lt;/h2&gt;

&lt;p&gt;Here's a more complete example. This agent has a few tools as examples to do things like look up weather, do calculations, and count letters in words. The actual tools themselves don't matter much for learning how to build an agent, as those will be unique to your specific use case. For now, we are just exploring how to wire all of the pieces together into an agent that does things:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.models.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get the current weather for a location.

    Args:
        location: City name, e.g. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Seattle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Weather in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: Sunny, 72°F&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;letter_counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;letter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Count occurrences of a specific letter in a word.

    Args:
        word: The word to search in
        letter: The single letter to count
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;letter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAIModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;letter_counter&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;I have a few questions:
1. What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather in Seattle?
2. What is 1547 * 382?
3. How many r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s are in &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strawberry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;?
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent will call each tool as needed, collect the results, and give you a single coherent response. You didn't have to write any routing logic or decide which tool to call for which question. The model handles that.&lt;/p&gt;

&lt;h2&gt;
  
  
  How AI Agents Remember Conversations
&lt;/h2&gt;

&lt;p&gt;Agents maintain conversation context automatically within a running process. Each call to the agent adds to the conversation history, so the model remembers what was said earlier:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My name is Morgan.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s my name?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Will remember "Morgan"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works across multiple turns without any extra code because the SDK manages the message history internally.&lt;/p&gt;

&lt;p&gt;That said, this memory only lives as long as the agent's process lives. If you're running a script locally and it exits, the history is gone. If you're hosting an agent behind an API, the process restarts with every request so message history is not maintained. &lt;/p&gt;

&lt;p&gt;This is because LLMs are stateless by default, and the conversation history that makes them feel stateful is just a list of messages that gets sent with every request.&lt;/p&gt;

&lt;p&gt;For anything beyond a local script, you need to persist that history somewhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  Session Managers
&lt;/h3&gt;

&lt;p&gt;Strands provides session managers that save and restore conversation state across invocations. The simplest option is &lt;code&gt;FileSessionManager&lt;/code&gt;, which writes session data to the local filesystem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.session.file_session_manager&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FileSessionManager&lt;/span&gt;

&lt;span class="n"&gt;session_manager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FileSessionManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-session&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;storage_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./sessions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# First run
&lt;/span&gt;&lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My name is Morgan.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Later, even after a restart, the agent remembers
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;FileSessionManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-session&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;storage_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./sessions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s my name?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Still remembers "Morgan"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;FileSessionManager&lt;/code&gt; stores each message as a JSON file on disk. It works well for local development. For hosted setups, you'd swap in a session manager backed by a database or a managed memory service like Amazon Bedrock AgentCore Memory. The integration pattern is the same, but you'd need to provision the infrastructure for the data store.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managing the Context Window
&lt;/h3&gt;

&lt;p&gt;There's another problem that shows up in longer conversations. Every LLM has a context window, which is the maximum amount of tokens it can process in a single request. Your system prompt, the full conversation history, tool definitions, and the model's response all have to fit inside that window.&lt;/p&gt;

&lt;p&gt;For short conversations this isn't an issue. But if your agent runs for dozens of turns, or if tools return large results, the conversation history can grow past what the model can handle.&lt;/p&gt;

&lt;p&gt;Strands provides a few out of the box conversation managers to deal with this:&lt;/p&gt;

&lt;p&gt;The sliding window manager keeps the most recent messages and drops the oldest ones when the history gets too long:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.agent.conversation_manager.sliding_window_conversation_manager&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SlidingWindowConversationManager&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;conversation_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SlidingWindowConversationManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;window_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# keep the last 40 messages
&lt;/span&gt;    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is simple and predictable: old messages fall off the end. The downside is that the agent loses context from earlier in the conversation. If a user said something important 50 messages ago, it's gone.&lt;/p&gt;

&lt;p&gt;The summarizing manager takes a different approach. Instead of dropping old messages, it summarizes them and then keeps the summary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.agent.conversation_manager.summarizing_conversation_manager&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SummarizingConversationManager&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;conversation_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SummarizingConversationManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;summary_ratio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# summarize the oldest 30% of messages
&lt;/span&gt;        &lt;span class="n"&gt;preserve_recent_messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# always keep the last 10
&lt;/span&gt;    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the context gets too large, the summarizing manager takes the context and generates a summary using the LLM. It then replaces those messages with the summary. The agent keeps the gist of what happened earlier without the full verbatim history. This costs an extra model call when summarization triggers, but it preserves more context than a simple sliding window.&lt;/p&gt;

&lt;p&gt;Which one you pick depends on your use case. For short, focused interactions, the sliding window is fine. For longer sessions where earlier context matters, the summarizing manager is worth the extra cost.&lt;/p&gt;

&lt;p&gt;There are also more advanced techniques you can use to manage your context window. Figuring out all the ways to manage context is called context engineering, and is an entire discipline in the AI engineering world. For simple agents, sliding window or summarization are good places to start.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Agent Loop Ties It All Together
&lt;/h2&gt;

&lt;p&gt;Stack these pieces together and you get a pretty capable agent without writing much code. A model handles reasoning and tool selection. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Tools extend what the agent can do. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hooks give you control over the lifecycle. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Session managers persist state across restarts. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Conversation managers keep the context window under control.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent loop ties it all together: the model calls tools, reads results, handles errors by trying a different approach, and returns the final response. &lt;/p&gt;

&lt;p&gt;This is the starting point. Once you start building more advanced agents you'll need more capabilities. That's where plugins come in.&lt;/p&gt;

&lt;p&gt;Plugins are classes that bundle hooks and tools together into behavioral modifications you can attach to any agent. The SDK ships with a few built-in plugins that show what this looks like in practice, and you can build your own custom plugins as needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extending Your Agent with Plugins
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Steering
&lt;/h3&gt;

&lt;p&gt;Steering is a plugin that evaluates the agent's output and sends corrective feedback when the response drifts from your guidelines. You give it a system prompt that defines the rules, and it uses a separate LLM call to judge each response before it reaches the user.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.vended_plugins.steering&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLMSteeringHandler&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;LLMSteeringHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ensure all responses are professional and concise. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reject any response that includes speculation or unverified claims.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, the steering plugin hooks into the agent's lifecycle using a &lt;code&gt;BeforeToolCallEvent&lt;/code&gt; hook. It intercepts tool calls, runs them through the evaluator, and returns one of three actions: proceed (let it through), guide (reject with feedback so the agent retries), or interrupt (escalate to a human). You don't write any of that logic. You just describe the rules in the system prompt for the steering handler.&lt;/p&gt;

&lt;p&gt;This is useful for enforcing tone in customer-facing agents, preventing agents from calling tools with dangerous arguments, or evaluating if agents are following directions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skills
&lt;/h3&gt;

&lt;p&gt;Skills are modular instructions that agents discover and activate at runtime. They follow the &lt;a href="https://skills.sh" rel="noopener noreferrer"&gt;Agent Skills specification&lt;/a&gt;, an open standard for packaging agent capabilities as folders containing instructions, scripts, and resources.&lt;/p&gt;

&lt;p&gt;A skill might teach an agent how to perform a code review following your team's conventions, how to deploy to a specific environment, or how to write content in a particular style.&lt;/p&gt;

&lt;p&gt;The agent only loads a skill's metadata (name and description) initially. When the agent decides a skill is relevant to the current task, it activates it and pulls in the full instructions. This keeps the context window clean since the agent only loads what it needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Building Your Own Plugins
&lt;/h3&gt;

&lt;p&gt;You can also build custom plugins. A plugin is a class that extends &lt;code&gt;Plugin&lt;/code&gt; and uses &lt;code&gt;@hook&lt;/code&gt; and &lt;code&gt;@tool&lt;/code&gt; decorators:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.plugins&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Plugin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hook&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.hooks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AfterToolCallEvent&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LoggingPlugin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Plugin&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A plugin that logs all tool calls and provides a utility tool.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logging-plugin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nd"&gt;@hook&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_before_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Called before each tool execution.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[LOG] Calling tool: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[LOG] Input: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@hook&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_after_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AfterToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Called after each tool execution.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[LOG] Tool completed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@tool&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;debug_print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Print a debug message.

        Args:
            message: The message to print
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Printed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Using the plugin
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;LoggingPlugin&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
&lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calculate 2 + 2 and print the result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the agent initializes, it scans the plugin for &lt;code&gt;@hook&lt;/code&gt; and &lt;code&gt;@tool&lt;/code&gt; methods and registers them automatically. You can stack multiple plugins on the same agent, and each one manages its own hooks and state without interfering with the others.&lt;/p&gt;

&lt;p&gt;Beyond plugins, Strands supports multi-agent patterns where agents invoke other agents as tools, MCP (Model Context Protocol) servers for connecting to external tool providers, and structured output for getting typed responses. These are all topics for another post. The point is that the same primitives (agents, tools, models, hooks) compose into more complex setups without requiring you to learn a different API. &lt;/p&gt;

&lt;h2&gt;
  
  
  Install and Build Your First AI Agent
&lt;/h2&gt;

&lt;p&gt;The fastest path from here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install: &lt;code&gt;pip install 'strands-agents[openai]' strands-agents-tools&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Set your API key: &lt;code&gt;export OPENAI_API_KEY=your_key&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Write a simple agent with one custom tool&lt;/li&gt;
&lt;li&gt;Run it and see what happens&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;a href="https://strandsagents.com/docs/user-guide/quickstart/overview/" rel="noopener noreferrer"&gt;Strands documentation&lt;/a&gt; has more examples, including multi-agent setups, observability, and production deployment patterns. The &lt;a href="https://github.com/strands-agents/sdk-python" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; has the source and community tools.&lt;/p&gt;

&lt;p&gt;The SDK is open source and actively developed. If you've been putting off building an agent because the frameworks felt heavy, give Strands a look. The barrier to entry is low, and it provides enough composability to keep up with you as your use case gets more complex.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>python</category>
      <category>aws</category>
    </item>
    <item>
      <title>AI Agents Are Your API's Biggest Consumer. Do They Care About Good Design?</title>
      <dc:creator>Morgan Willis</dc:creator>
      <pubDate>Tue, 24 Mar 2026 13:50:14 +0000</pubDate>
      <link>https://forem.com/aws/ai-agents-are-your-apis-biggest-consumer-do-they-care-about-good-design-31l2</link>
      <guid>https://forem.com/aws/ai-agents-are-your-apis-biggest-consumer-do-they-care-about-good-design-31l2</guid>
      <description>&lt;p&gt;We've always designed APIs for humans. A well built API means obsessing over naming conventions, RESTful patterns, and clear documentation because the goal is simple: make systems easy for developers to understand. But AI is changing who the consumer of software is, and developers are asking whether the rules we've followed for decades still hold up.&lt;/p&gt;

&lt;p&gt;When the primary user of an API is an AI system that reads documentation, adapts to unfamiliar patterns, and experiments when something fails, maybe consistent APIs and clean abstractions don't matter anymore.&lt;/p&gt;

&lt;p&gt;Everything in me wants to reject this concept. My gut instinct is to say "my APIs need to be pretty or I'll die".&lt;/p&gt;

&lt;p&gt;I've been thinking about this a lot lately, and unfortunately I think good arguments are made from both angles on this. Let's walk through both sides.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Case for "Abstractions Don't Matter Anymore"
&lt;/h3&gt;

&lt;p&gt;Some developers believe AI will reduce the importance of traditional abstractions. I've heard this take a lot.&lt;/p&gt;

&lt;p&gt;LLMs are extremely good at pattern recognition. They can read documentation, inspect responses, experiment, and adapt when things fail. From this perspective, messy systems don't seem like a big problem. A human might struggle with inconsistent naming or poor documentation, but an AI can simply figure it out through trial, error, and being smarter than me.&lt;/p&gt;

&lt;p&gt;So maybe if abstractions exist to make code understandable to humans, and the primary consumer is no longer human, then the old rules don't apply.&lt;/p&gt;

&lt;p&gt;In practice, you can build agent systems that work around poorly designed APIs. Say you're integrating with an API where half the endpoints return errors as HTTP status codes and the other half always return 200 with an error field buried in the response body.&lt;/p&gt;

&lt;p&gt;The agent pulls the docs, writes the code, and it looks reasonable. Then it runs the tests and it breaks because the code is checking status codes on an endpoint that never returns them.&lt;/p&gt;

&lt;p&gt;The agent reads the error, adds response body parsing, and tries again. Maybe it over-corrects and starts modifying the way it's handling status codes everywhere, breaking a different call. So, it adjusts again. Then finally, the third try works. These systems exist today and they get there eventually.&lt;/p&gt;

&lt;p&gt;As models continue to improve, API design will matter less and less as the models can brute force their way to working code. &lt;/p&gt;

&lt;h3&gt;
  
  
  The Case for "Abstractions Still Matter"
&lt;/h3&gt;

&lt;p&gt;Now for the other side of the argument.&lt;/p&gt;

&lt;p&gt;A human developer interacts with an API occasionally. An AI system like a coding agent might interact with it hundreds of times in a single session. When something is poorly designed, the problems compound fast in the form of unnecessary retries, token-heavy debugging loops, and ugly workarounds.&lt;/p&gt;

&lt;p&gt;I ran into this with an API that had inconsistent naming across its endpoints. I was debugging an issue with an app I was building, and my coding agent kept thinking it had identified the issue because a parameter name for an API I was using didn't align with historical patterns for this type of API. That wasn't the issue at all, it was completely irrelevant, but the agent kept getting hung up on it.&lt;/p&gt;

&lt;p&gt;Every time I debug something that uses this specific API my coding agent always says "I found it! The parameter name should be X instead of Y!" Then it changes it and deploys again and it doesn't work because that wasn't the issue. It kept making the same wrong assumption across sessions.&lt;/p&gt;

&lt;p&gt;Unlike a human who hits a weird error and remembers it next time, LLMs are stateless by default. Every new session starts fresh, and agents can spin up tons of sessions in a single workflow. Each of which will run into the same problem.&lt;/p&gt;

&lt;p&gt;Every ambiguity in an API has a token cost and poor API design has direct financial consequences in a way that wasn't true when the only cost was developer frustration.&lt;/p&gt;

&lt;p&gt;And another thing I've noticed: if you watch a coding agent work through a problem like this, it often gives up and tries a different approach entirely after a few tries. It'll swap libraries, use a different endpoint, or cobble together a workaround using some other approach. That's the AI doing exactly what it should do, adapting.&lt;/p&gt;

&lt;p&gt;From the developer perspective, I don't always like the workarounds the AI chooses. Sometimes one unclear API makes my AI think it needs to redesign my entire component. Other times it bypasses the API in a way where it looks like it's working but it's actually relying on some weird hardcoded values somewhere.&lt;/p&gt;

&lt;p&gt;And from your perspective of an API owner, the AI just decided not to use your API. Your messy design just cost you a new user.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Human Compatibility Problem
&lt;/h3&gt;

&lt;p&gt;There's another angle to this too. The abstractions we built for humans are now embedded in how AI systems learn.&lt;/p&gt;

&lt;p&gt;Modern software ecosystems contain decades of common coding conventions. These were originally created to help humans understand systems. But those same patterns now appear throughout model training data, and that has consequences.&lt;/p&gt;

&lt;p&gt;When you name your endpoint &lt;code&gt;/api/v2/users/{id}&lt;/code&gt;, the model has seen that pattern millions of times. It knows what to expect. When you name it &lt;code&gt;/backend/person/fetch?identifier={id}&lt;/code&gt;, you're fighting against the weight of its training. The model can learn your pattern, but there's friction.&lt;/p&gt;

&lt;p&gt;Coding assistants are increasingly abstracting away the act of writing syntax from developers which is great until you need to peek under the abstraction.&lt;/p&gt;

&lt;p&gt;If an agent generates code using unfamiliar patterns or unconventional APIs, a human still has to review it, debug it, and maintain it. We wouldn't want agents writing assembly language even if it ran faster, because most of us can't read it. The same logic applies to API conventions. Familiar patterns keep the code understandable for the humans who still have to live with it.&lt;/p&gt;

&lt;p&gt;The patterns we created to help humans are now baked into how AI understands software, and that path dependence matters in both directions. Breaking conventions costs you in AI effectiveness and in human readability.&lt;/p&gt;

&lt;h3&gt;
  
  
  You Can Engineer Around It (If You Can Afford It)
&lt;/h3&gt;

&lt;p&gt;Writing this blog turned my question from "does API design and thoughtful abstraction matter anymore?" into "how much money do you have?"&lt;/p&gt;

&lt;p&gt;Every time an AI system has to figure out how something works, that's tokens being consumed and a potentially hacky workaround making its way into your code base. The adaptability is real, but anyone who's been ripping Claude Opus 4.6 with the 1 million token context window using agent teams knows that this is not free.&lt;/p&gt;

&lt;p&gt;You can throw tokens at bad abstractions, build sophisticated systems to work around them, add layers of verification, validation, and correction. Multiple agents checking each other's work. Memory and caching layers to avoid repeated discovery. But wouldn't it be nice to just get it right the first try? Clean abstractions and good API design can give that to you.&lt;/p&gt;

&lt;p&gt;Ideally you have both things in place. Clean APIs, meaningful abstractions, and clear documentation means the agent has minimal friction and looping. Then add in enough scaffolding around the agent to recover when an API it's using doesn't have all of those things. That combination reduces the cost of code generation, keeps code human readable and debuggable, and the adaptability is still built in.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>programming</category>
    </item>
    <item>
      <title>The Agent Buddy System: When Prompt Engineering Isn't Enough</title>
      <dc:creator>Morgan Willis</dc:creator>
      <pubDate>Wed, 18 Mar 2026 18:01:01 +0000</pubDate>
      <link>https://forem.com/aws/the-agent-buddy-system-when-prompt-engineering-isnt-enough-5dni</link>
      <guid>https://forem.com/aws/the-agent-buddy-system-when-prompt-engineering-isnt-enough-5dni</guid>
      <description>&lt;p&gt;Most AI agents don’t reliably follow directions, and that’s one of the biggest reasons they never make it from POC to production.&lt;/p&gt;

&lt;p&gt;This is how deploying agents usually plays out: you write clear instructions in your prompt, test against every scenario you can think of, and ship it. Then the agent skips steps, drifts from your guidelines, or invents behavior you didn't anticipate. So you add more detail, more constraints, more explicit directions.&lt;/p&gt;

&lt;p&gt;The prompt is getting huge now but you’re sure you’ve captured all the rules. You deploy again. Same problem. Eventually, you hit a wall and give up. &lt;/p&gt;

&lt;p&gt;I ran into this firsthand trying to create a simple AI assistant to help me write. I gave it samples of my writing style, told it to write like me, and it did start off okay. But after a few turns it drifted back into generic AI-speak. I'm talking em dashes everywhere, staccato sentences for dramatic effect, and that weird "It's not about X, it's about Y" framing that sounds profound but actually says nothing. By the end of a long session, the output usually sounds nothing like me. &lt;/p&gt;

&lt;p&gt;This example makes the problem obvious because you can read the output and immediately tell something’s off. But the same thing happens in more serious scenarios, like compliance checks, customer support flows, or multi-step workflows where the stakes are higher.&lt;/p&gt;

&lt;p&gt;What's actually happening is that as conversations get longer, the model pays less attention to earlier instructions.&lt;/p&gt;

&lt;p&gt;Prompt engineering helps, but it can only take you so far. What you need is a feedback loop that catches drift and corrects it before the response ever reaches the user.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Buddy System
&lt;/h2&gt;

&lt;p&gt;Instead of trying to make one agent behave perfectly, the solution was to introduce a second agent to the system. One does the work, and the other checks it. That’s what I’ve been calling the agent buddy system.&lt;/p&gt;

&lt;p&gt;The main agent handles the task: writing, reasoning, calling tools, whatever it needs to do. The buddy sits alongside it, watching the output. If the agent skips a step, tries to misuse a tool, or drifts from the defined rules, the buddy steps in and helps get things back on track.&lt;/p&gt;

&lt;p&gt;The idea is simple: don’t rely on the model to always follow instructions. Assume it will drift, and build something that corrects it when it does.&lt;/p&gt;

&lt;p&gt;This is essentially using an LLM as a judge. The evaluator model inspects the output from the worker model and decides whether it meets the criteria. If it does, the response goes through. If not, it sends guidance and the agent can try again.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkm8axy1rse3zb1qht7f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkm8axy1rse3zb1qht7f.png" alt="Agent Buddy System" width="800" height="269"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It turns out that having two models that disagree with each other is safer than having one model that just does whatever it wants.&lt;/p&gt;

&lt;p&gt;You can build this pattern yourself, but I used the Strands Agents SDK because it already supports this kind of feedback loop through a feature called steering.&lt;/p&gt;

&lt;p&gt;Steering lets you inject just-in-time guidance into the agent’s execution instead of front-loading everything into a massive prompt and hoping for the best. &lt;/p&gt;

&lt;p&gt;Under the hood, Strands steering works through hooks in the agent’s lifecycle. You can intercept tool calls before they execute to run custom validations, or evaluate the model’s response after it’s generated to check things like tone, format, or adherence to the prompt.&lt;/p&gt;

&lt;p&gt;The steering agent intercepts the call and returns one of three actions: &lt;code&gt;Proceed&lt;/code&gt; (accept), &lt;code&gt;Guide&lt;/code&gt; (reject with feedback for retry), or &lt;code&gt;Interrupt&lt;/code&gt; (escalate to a human). &lt;/p&gt;

&lt;h2&gt;
  
  
  Building a Writing Buddy
&lt;/h2&gt;

&lt;p&gt;To fix my AI writing problem, I built a steering handler that checks every response against a style guide with examples of my actual writing. If the output doesn’t sound like me, the handler catches it and asks for a rewrite before I ever see it.&lt;/p&gt;

&lt;p&gt;In Strands, this means creating a &lt;code&gt;SteeringHandler&lt;/code&gt; and attaching it to your agent as a plugin.&lt;/p&gt;

&lt;p&gt;For my use case, I only needed to evaluate the final output, so I used &lt;code&gt;steer_after_model()&lt;/code&gt; to inspect each response and decide whether to accept it or send it back with feedback.&lt;/p&gt;

&lt;p&gt;Here’s my &lt;code&gt;VoiceSteeringHandler&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VoiceSteeringHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SteeringHandler&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Evaluates writing output against a style guide using an LLM judge.

    Intercepts model responses via steer_after_model and uses a separate
    steering agent to check for style violations. If a violation is found,
    it guides the agent to rewrite with targeted feedback.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;style_guide&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_providers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;style_guide&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;style_guide&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retry_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;steer_after_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stop_reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StopReason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Evaluate model output against the style guide.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[STEERING] Evaluating model output...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retry_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retry_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Proceed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max retries reached, accepting output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Use a separate steering agent as an LLM judge
&lt;/span&gt;        &lt;span class="n"&gt;steering_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You evaluate writing against a style guide.
            Catch clear violations, not nitpicks.

            STYLE GUIDE:
            &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;style_guide&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

            REJECT for: banned words/phrases from the style guide, em dashes,
            &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;It&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s not X. It&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s Y.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; reframing, obvious marketing tone, or meta-commentary.

            APPROVE if: tone is developer-to-developer with no banned words/phrases/patterns.
            When in doubt, APPROVE.

            Respond with APPROVE or REJECT: [quote the violation].&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;callback_handler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;steering_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Evaluate this text:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REJECT:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retry_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;feedback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REJECT:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Guide&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fix this issue: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Only fix the cited issue. Output only the content, nothing else.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retry_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Proceed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Output approved by steering agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then to attach it to your main agent, you use a &lt;code&gt;plugin&lt;/code&gt; like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.anthropic.claude-sonnet-4-20250514-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a writing assistant that writes in a specific voice.
    Follow every rule in the style guide below. Output only the requested writing.
    Never add meta-commentary or questions like &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Would you like me to adjust?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

    STYLE GUIDE:
    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;style_guide&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;plugins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;VoiceSteeringHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;style_guide&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;style_guide&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the steering agent sees the output doesn't match, the handler returns &lt;code&gt;Guide&lt;/code&gt; with specific feedback. The agent discards its response and tries again, knowing exactly what went wrong. After &lt;code&gt;max_retries&lt;/code&gt; attempts, it lets the response through rather than looping forever.&lt;/p&gt;

&lt;p&gt;The evaluator prompt checks for voice match against your examples, but also flags AI vocabulary (words like "crucial," "delve," "tapestry"), structural patterns (em dashes, pseudo-profound reframing), and other tells that make text sound machine-generated. You give it paragraphs from your actual writing, and it asks "does this new text sound like these examples?" It's essentially a style linter powered by an LLM. &lt;/p&gt;

&lt;p&gt;That’s a judgment call, and this is where steering really shines. Instead of trying to build complicated, deterministic evaluation logic, you let a model make that call and provide targeted feedback.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does It Work?
&lt;/h2&gt;

&lt;p&gt;Yes. Here’s what I saw in my own testing before getting into larger-scale results.&lt;/p&gt;

&lt;p&gt;I ran a small evaluation: 5 multi-turn writing sessions where a simulated user iteratively refines a piece, repeated 5 times each using Claude Sonnet 4.5. That's the kind of back-and-forth that happens in real writing workflows, and it's where drift becomes noticeable. The baseline voice adherence averaged 25% by the end of the sessions, but the steered version held at 100%.&lt;/p&gt;

&lt;p&gt;For single-turn prompts with more capable models, both performed about the same for a small evaluation dataset, because larger models are already pretty good at following style guides on their own. The difference shows up in the longer sessions where drift compounds, or when weaker models are used.&lt;/p&gt;

&lt;p&gt;That's a modest eval set, so take the exact numbers directionally rather than as gospel. But the pattern consistently showed unsteered sessions degraded noticeably after a few turns, while steered sessions stayed on voice throughout.&lt;/p&gt;

&lt;p&gt;The more compelling evidence comes from Clare Liguori, Senior Principal Software Engineer at AWS, who ran a similar evaluation at a much larger scale. She &lt;a href="https://strandsagents.com/blog/steering-accuracy-beats-prompts-workflows?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;tested five approaches to guiding agent behavior&lt;/a&gt; on a library book renewal agent across 3,000 runs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple prompt instructions reached 82.5% accuracy, meaning roughly one in five interactions failed&lt;/li&gt;
&lt;li&gt;Agent SOPs hit 99.8%, but at 3x the token cost&lt;/li&gt;
&lt;li&gt;Graph-based workflows reached 80.8%, often failing outside predefined paths&lt;/li&gt;
&lt;li&gt;Steering hit 100% across 600 runs while using 66% fewer input tokens than SOPs and 47% fewer output tokens than workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most common failure without steering was skipping the book status check before renewing (43% of failures), followed by missing the confirmation message (40%). These are exactly the kinds of steps models deprioritize as context grows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things To Consider
&lt;/h2&gt;

&lt;p&gt;This pattern works well, but there are a few things you should consider. &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Latency&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Each steering intervention adds another model call. If the handler returns &lt;code&gt;Guide&lt;/code&gt;, the agent has to regenerate with feedback, which can mean two or three round trips for a single response. Once you add in tool calls the latency becomes a real factor.&lt;/p&gt;

&lt;p&gt;That’s fine for background tasks or workflows where accuracy matters more than speed. But it’s the wrong tradeoff for real-time applications where users expect quick responses and the stakes are low.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Token costs&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Tokens do add up, but the picture is more nuanced than you might expect. &lt;/p&gt;

&lt;p&gt;Steering uses more tokens than simple prompt instructions because you’re sending feedback back to the agent when it strays. But compared to approaches that actually achieve high accuracy, like SOPs, steering is often more efficient.&lt;/p&gt;

&lt;p&gt;You should reach for steering when a single prompt isn't enough, but try using the single prompt approach first.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Steering prompt quality&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The quality of your steering prompts directly impacts performance.&lt;/p&gt;

&lt;p&gt;If your handler gives vague feedback, the agent can get stuck retrying without improving. Set retry limits, make your &lt;code&gt;Guide&lt;/code&gt; feedback specific, and if the same correction keeps firing, fix the prompt instead of increasing retries. &lt;/p&gt;

&lt;p&gt;And remember, you're using a model to judge another model. That means they can share the same blind spots. If both the worker and the evaluator miss the same kind of mistake, steering won't catch it. &lt;/p&gt;

&lt;p&gt;Try using two different models, and for high-stakes use cases, pair this with deterministic checks where you can.&lt;/p&gt;

&lt;h3&gt;
  
  
  When not to use steering
&lt;/h3&gt;

&lt;p&gt;Steering assumes you have a clear definition of "correct." That works for style guides, compliance rules, and structured workflows. It doesn't work as well for creative tasks where you actually want the model to surprise you because steering will pull it back toward whatever your evaluator thinks is right. And if your criteria can be expressed as deterministic checks (regex, schema validation, rule engines), maybe skip steering. It's slower, costs more, and adds uncertainty where you don't need it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond Writing Assistants
&lt;/h2&gt;

&lt;p&gt;Reliable agents come from the systems you build around them. &lt;/p&gt;

&lt;p&gt;Steering applies anywhere an agent needs consistent behavior over time. Customer service agents maintaining tone across dozens of interactions, code review bots enforcing your team's conventions, or compliance workflows where skipping a step has real consequences. &lt;/p&gt;

&lt;p&gt;The pattern is the same: evaluate the output, provide guidance, retry if needed. You just swap the evaluator criteria.&lt;/p&gt;

&lt;p&gt;Clare Liguori’s post walks through &lt;a href="https://strandsagents.com/blog/steering-accuracy-beats-prompts-workflows?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;her full evaluation&lt;/a&gt; of the library book renewal agent. The &lt;a href="https://strandsagents.com/docs/user-guide/concepts/plugins/steering/?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;steering documentation&lt;/a&gt; covers the full API. &lt;/p&gt;

&lt;p&gt;Some agents need a buddy to keep them on track. Steering gives you that.  &lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>programming</category>
      <category>aws</category>
    </item>
    <item>
      <title>Is Software Engineering Cooked? Not Yet. But Maybe.</title>
      <dc:creator>Morgan Willis</dc:creator>
      <pubDate>Tue, 03 Mar 2026 15:09:26 +0000</pubDate>
      <link>https://forem.com/aws/is-software-engineering-cooked-not-yet-but-maybe-5e0j</link>
      <guid>https://forem.com/aws/is-software-engineering-cooked-not-yet-but-maybe-5e0j</guid>
      <description>&lt;p&gt;"Software engineering is solved." This is all I see lately when scrolling LinkedIn, X, or Reddit.&lt;/p&gt;

&lt;p&gt;The message is loud and clear: developers are cooked and we should all pivot immediately and become plumbers or electricians.&lt;/p&gt;

&lt;p&gt;It's not a &lt;em&gt;completely&lt;/em&gt; crazy idea. AI coding tools have improved at a pace that has affected software development more than almost any other white-collar field. Developers who are deep into these tools can generate large amounts of working code quickly, and the days of memorizing syntax and writing everything by hand are already over.&lt;/p&gt;

&lt;p&gt;The obvious response is that code generation is only one slice of the software engineering pie. When I say this, usually everyone nods along.&lt;/p&gt;

&lt;p&gt;But we rarely talk about what the rest of the pie actually is. If code generation is mostly solved, then the question is what remains.&lt;/p&gt;

&lt;p&gt;Right now the results from AI coding tools are wildly uneven. Some developers report massive productivity gains. Others say it makes them slower and produces nothing but AI slop.&lt;/p&gt;

&lt;p&gt;A big part of the difference comes down to how the systems around AI coding tools are built and used.&lt;/p&gt;

&lt;p&gt;Software engineering looks solved because code generation is advancing quickly. Understanding why explains what we're seeing right now and where this is headed.&lt;/p&gt;

&lt;p&gt;A good place to start is with a simple question: why is code generation a winning use case for generative AI?&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Code Is Easy for AI
&lt;/h2&gt;

&lt;p&gt;Software development is heavily pattern based. We work with language syntax, framework conventions, data structures, standard patterns, reusable components, and familiar structures.&lt;/p&gt;

&lt;p&gt;Most code follows established patterns. The training data for these models includes massive volumes of code from real-world examples and open source projects. There are millions of implementations of common patterns to learn from.&lt;/p&gt;

&lt;p&gt;Writing code is largely about applying known patterns to new problems. You need a REST endpoint? There are thousands of examples. Need to validate user input? Want to implement authentication? The patterns are also very well documented.&lt;/p&gt;

&lt;p&gt;This is why syntax and implementation for most common use cases is mostly solved. The models have seen these patterns thousands of times. They can reproduce them reliably with minor variations to fit your specific context.&lt;/p&gt;

&lt;p&gt;But pattern matching alone doesn't explain why AI coding tools work as well as they do. There's another factor that makes software particularly suited to AI generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Software Is Verifiable
&lt;/h2&gt;

&lt;p&gt;Correct software for most non-AI related use cases has a simple property: the same input produces the same output. That's determinism.&lt;/p&gt;

&lt;p&gt;The actual implementation can vary. If you give the same user story to two different developers, they will produce two different solutions.&lt;/p&gt;

&lt;p&gt;The structure, abstractions, and technologies might differ. But from an end-user perspective, the functional requirements must be met.&lt;/p&gt;

&lt;p&gt;That property makes software unusually well suited to AI-assisted development. Nondeterministic systems can generate many possible implementations, but we can still measure whether the result is correct. Either the behavior is correct or it isn't. The tests pass or they don't.&lt;/p&gt;

&lt;p&gt;The function of the software is the most important part, which can be measured and tested automatically.&lt;/p&gt;

&lt;p&gt;But lots of different fields have measurable outputs that can be tested and iterated upon. Why are software developers likely to be among the first groups to feel the full impact of AI adoption?&lt;/p&gt;

&lt;h2&gt;
  
  
  Automation Is Our Culture
&lt;/h2&gt;

&lt;p&gt;Part of the reason is that software engineering is already deeply mechanized, and it's in our culture to automate everything that we can.&lt;/p&gt;

&lt;p&gt;We have CI/CD pipelines, automated tests, automated security scanning, and metrics everywhere. We already understand how to build feedback loops and automate processes. Now we're applying those same skills and lessons learned to other parts of our work in ways that weren't possible before generative AI. And the reason it's accelerating so quickly is that we are both the domain experts and the ones building the systems.&lt;/p&gt;

&lt;p&gt;This creates extremely rapid iteration that is less likely to happen organically in other fields.&lt;/p&gt;

&lt;p&gt;If a developer discovers a truly innovative AI engineering technique on Monday, by Wednesday it's in a blog post with thousands of readers. By Friday multiple people have attempted to build tools around it. Within weeks, if something else hasn't replaced it, the idea gets integrated into mainstream workflows. Rinse and repeat.&lt;/p&gt;

&lt;p&gt;But even with all of this progress, implementation is still only one slice of the software engineering pie. It takes a lot of work to get systems working in production reliably.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rest of the Software Engineering Pie
&lt;/h2&gt;

&lt;p&gt;Working code is only one dimension of correctness. Production-ready software requires many dimensions to align at once, like security, scalability, architecture, maintainability, integration, performance, stability, and dependencies. A system that works correctly in isolation can still fail when any of these dimensions are ignored.&lt;/p&gt;

&lt;p&gt;This is often where the conversation turns to the idea that developers will all become architects. If implementation becomes automated, humans focus on higher-level system design.&lt;/p&gt;

&lt;p&gt;There is truth to that, and I think this is where we are now. Developers who understand system design have an advantage in today's market. But will higher-level design remain purely a human activity? That assumption is already being tested.&lt;/p&gt;

&lt;h2&gt;
  
  
  Encoding Judgment
&lt;/h2&gt;

&lt;p&gt;Many of the harder parts of software engineering come down to judgment. What does "good" look like for a particular system? Which tradeoffs are acceptable? Where should the complexity live? This kind of judgment is what separates working code from production-ready software.&lt;/p&gt;

&lt;p&gt;Developers are already experimenting with ways to encode that judgment into automated systems. These systems look less like chatbots and more like coordinated pipelines. One component generates an implementation. Others verify behavior by running tests, scan for issues with deterministic tools, and evaluate architectural concerns. Higher-level review can be layered in, where models evaluate tradeoffs, consistency, and design decisions.&lt;/p&gt;

&lt;p&gt;In many ways, this looks like a natural extension of CI/CD. Instead of validating code after a human writes it, the system validates code as it's generated. The feedback loop gets tighter.&lt;/p&gt;

&lt;p&gt;This is also part of why specification-driven development is gaining traction. If models generate the implementation, developers need to define behavior precisely enough that correctness can be tested automatically. The discipline shifts from writing code to defining what correct code looks like.&lt;/p&gt;

&lt;p&gt;None of this is solved. Tried and true patterns for automating higher-level concerns like architectural evaluation and performance validation don't exist yet, though many teams are actively working on it (with varying degrees of success).&lt;/p&gt;

&lt;p&gt;Adoption will be incremental, and regulated industries will move slower because reliability and accountability matter more than speed. But each dimension of engineering judgment that gets encoded into a validation step is one more thing the system can handle.&lt;/p&gt;

&lt;p&gt;For those using the latest and greatest tooling, the hardest part of being a developer has shifted from learning syntax to developing judgment. That shift may make the early stages of a software career harder, because judgment takes time and experience to build.&lt;/p&gt;

&lt;h2&gt;
  
  
  Are We Cooked?
&lt;/h2&gt;

&lt;p&gt;Is software development cooked? Not yet. &lt;/p&gt;

&lt;p&gt;Maybe in the future. But I think that humans will be in the loop for longer than we inside the AI bubble currently believe. Only time will tell if I'm right.&lt;/p&gt;

&lt;p&gt;However, the job is changing faster than most of us expected. A developer can ship features in hours that used to take days. Small teams can build products that previously required dozens of engineers. &lt;/p&gt;

&lt;p&gt;In the short term, there's a growing need for developers who can build the systems that make AI-generated code production-ready.&lt;/p&gt;

&lt;p&gt;We used to spend most of our time writing implementations. That time is shrinking. What's growing is everything around it: figuring out what correct software looks like and building the validation to prove it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>developers</category>
      <category>coding</category>
      <category>agents</category>
    </item>
    <item>
      <title>The Python Function That Implements Itself</title>
      <dc:creator>Morgan Willis</dc:creator>
      <pubDate>Tue, 24 Feb 2026 13:11:06 +0000</pubDate>
      <link>https://forem.com/aws/the-python-function-that-implements-itself-3el8</link>
      <guid>https://forem.com/aws/the-python-function-that-implements-itself-3el8</guid>
      <description>&lt;p&gt;What if you could write a Python function where the docstring is the implementation? You define the inputs, the return type, and you write the validation logic that defines what "correct" means. AI handles the rest.&lt;/p&gt;

&lt;p&gt;That's the programming model behind AI Functions, a new experimental library from Strands Labs. &lt;/p&gt;

&lt;p&gt;Strands Labs is a new GitHub organization where experimental features of the Strands Agents SDK are being built in the open. &lt;/p&gt;

&lt;p&gt;With AI functions you still write the validation logic, but instead of implementing the function itself, you let the AI handle generation and self-correct against your checks.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Different Way to Write AI-Powered Code
&lt;/h2&gt;

&lt;p&gt;Most AI-powered code follows the same pattern. You call the model, parse the response, write validation checks, handle errors, and retry when things go wrong. It's tedious boilerplate that everyone writes slightly differently.&lt;/p&gt;

&lt;p&gt;AI Functions inverts this pattern. &lt;/p&gt;

&lt;p&gt;You write a function signature, a docstring that serves as the prompt, a return type that defines the contract, and post-conditions that define what correct looks like. There is no function body. The function executes on an LLM instead of a CPU.&lt;/p&gt;

&lt;p&gt;The key here is that you still write real validation code. Post-conditions are normal Python functions you author. You define the acceptance criteria, and the system enforces it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Receipt Parser
&lt;/h2&gt;

&lt;p&gt;Let's see what this looks like with a receipt parser. &lt;/p&gt;

&lt;p&gt;Receipts are a good fit for this pattern because the extraction itself is fuzzy (vendors format receipts differently, line items vary, tax rules change), but the validation is deterministic. You can write a post-condition to check whether the math adds up with plain arithmetic. &lt;/p&gt;

&lt;p&gt;In practice, most receipts start as images or PDFs. This example assumes you've already extracted the text using OCR or a document processing service, and now you need to turn that raw text into structured, validated data. &lt;/p&gt;

&lt;p&gt;We'll build something that handles that second step: extracting structured data from receipt text and validating that the math actually adds up.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ai_functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ai_function&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LineItem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Item or service description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Number of units&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;unit_price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Price per unit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total for this line item (quantity * unit_price)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ReceiptData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;vendor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Vendor or company name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;invoice_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invoice or receipt number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invoice date (YYYY-MM-DD format)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;LineItem&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;List of line items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;subtotal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sum of all line item amounts before tax&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tax&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tax amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Final total (subtotal + tax)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_math&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ReceiptData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Validate that all math is internally consistent.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Check line items: amount = quantity × unit_price
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unit_price&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Line item &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;): amount &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; != &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quantity &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; * unit_price &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unit_price&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Verify subtotal = sum of line items
&lt;/span&gt;    &lt;span class="n"&gt;items_sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subtotal&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;items_sum&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Subtotal &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subtotal&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; != sum of line items &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;items_sum&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Confirm total = subtotal + tax
&lt;/span&gt;    &lt;span class="n"&gt;expected_total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subtotal&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tax&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;expected_total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; != subtotal &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subtotal&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; + tax &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tax&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;expected_total&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="nd"&gt;@ai_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Parse a receipt or invoice text and extract structured expense data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;post_conditions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;validate_math&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;max_attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_receipt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;receipt_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ReceiptData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Extract structured data from this receipt/invoice.
    Receipt text: {receipt_text}

    Instructions:
    - Extract all line items with their quantity, unit price, and total amount
    - Calculate subtotal as the sum of all line item amounts
    - Extract tax amount (if no tax is listed, use 0.0)
    - Calculate total as subtotal + tax
    - Use YYYY-MM-DD format for the date
    - Ensure all math is consistent
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Pydantic models define the shape of the output. The &lt;code&gt;@ai_function&lt;/code&gt; decorator marks this as an AI-powered function. The docstring becomes the prompt, with &lt;code&gt;{receipt_text}&lt;/code&gt; as a template variable for the input. The return type tells the system what structure to generate.&lt;/p&gt;

&lt;p&gt;Post-conditions let you define what "correct" means in your specific domain. They're standard Python functions that enforce your business logic. The math has to add up and the vendor name can't be empty. The date has to be in the right format. These aren't things you can guarantee with prompt engineering alone.&lt;/p&gt;

&lt;p&gt;Here's what happens when you call &lt;code&gt;parse_receipt&lt;/code&gt; with some receipt text. &lt;/p&gt;

&lt;p&gt;Under the hood, the library hands off to a Strands agent loop. It takes your docstring (with the receipt text filled in), sends it to the model, and asks it to return a &lt;code&gt;ReceiptData&lt;/code&gt; object. &lt;/p&gt;

&lt;p&gt;Because it's running through a Strands agent, the function gets access to the same tool-use capabilities that Strands agents have, and as the integration matures, potentially other Strands features as well. But from your perspective, as the caller, it's just a function call that returns a Pydantic model.&lt;/p&gt;

&lt;p&gt;Once the model responds, &lt;code&gt;validate_math&lt;/code&gt; runs against the result. It checks whether each line item's amount equals quantity times unit price, whether the subtotal equals the sum of all line items, and whether the total equals the subtotal plus tax.&lt;/p&gt;

&lt;p&gt;If everything checks out, you get your &lt;code&gt;ReceiptData&lt;/code&gt; back. If &lt;code&gt;validate_math&lt;/code&gt; raises a &lt;code&gt;ValueError&lt;/code&gt;, the library takes that error message ("Subtotal 1,492.30 != sum of line items 1,492.80") and sends it back to the model along with the original prompt. The model sees exactly what it got wrong and tries again. This loop repeats up to &lt;code&gt;max_attempts&lt;/code&gt; times, so with &lt;code&gt;max_attempts=3&lt;/code&gt;, the model gets three chances to produce output that passes your checks.&lt;/p&gt;

&lt;p&gt;Worth noting: &lt;code&gt;validate_math&lt;/code&gt; checks internal consistency, not extraction accuracy. If the model misreads "$8,400" as "$840" from messy OCR output, the math could still check out while being completely wrong. But that's what additional post-conditions are for. You could write one that cross-references extracted values against the raw input text, checking whether the total the model returned actually appears in the receipt. If it doesn't, something went wrong during extraction, not just during math. The pattern scales to whatever "correct" means for your use case.&lt;/p&gt;

&lt;p&gt;You could add more post-conditions too. Maybe &lt;code&gt;validate_completeness&lt;/code&gt; to check that required fields aren't empty. Maybe &lt;code&gt;validate_date_format&lt;/code&gt; to ensure dates parse correctly. Each one is just a Python function that raises an error when something's wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tradeoffs
&lt;/h2&gt;

&lt;p&gt;This pattern is clean, but there are some tradeoffs.&lt;/p&gt;

&lt;p&gt;Latency is the first one. Each retry is another model call. If you set &lt;code&gt;max_attempts=3&lt;/code&gt;, you're looking at up to three round trips to the model. That's fine for batch processing and background jobs. It's not great for user-facing APIs where you need sub-second responses.&lt;/p&gt;

&lt;p&gt;The second tradeoff is cost. Retries multiply your API spend, and each invocation uses a fresh instance of the agent. If your post-conditions fail frequently, you're paying for multiple attempts per extraction.&lt;/p&gt;

&lt;p&gt;This retry loop is a feature, not a bug. Monitor your validation failure rates. If post-conditions are failing on most first attempts, your prompt needs work, not more retries. Post-conditions are there to catch the edge cases, not to fix fundamentally broken prompts. &lt;/p&gt;

&lt;p&gt;You're trading latency and cost for correctness guarantees on logic you never had to implement. &lt;/p&gt;

&lt;p&gt;You didn't have to anticipate every receipt format, handle every edge case for how vendors list line items, or write a parser that accounts for the dozen ways people format currency. The model handles that ambiguity, and the post-conditions catch the errors. &lt;/p&gt;

&lt;p&gt;That's the right trade for document processing pipelines, financial data extraction, and any task where a wrong answer is worse than a slow answer. It's the wrong trade for real-time chat interfaces or high-volume, cost-sensitive operations.&lt;/p&gt;

&lt;p&gt;The library is experimental and it's a new repo from Strands Labs. It's worth exploring, and expect it to change as it matures.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern Underneath
&lt;/h2&gt;

&lt;p&gt;What makes this really interesting to me is the programming model. You declare intent through the function signature and docstring. You define correctness through post-conditions, and the AI handles the implementation.&lt;/p&gt;

&lt;p&gt;This separation keeps your validation logic as real Python code that you control, test, and version. It's not buried in a prompt or hoping the model "understands" what you mean by correct. When requirements change, you update the post-conditions. When the model improves, you get better first-attempt success rates without changing your code. &lt;/p&gt;

&lt;p&gt;Post-conditions give you a way to programmatically define "correct" for your domain, which is something prompt engineering alone can't do. A prompt can tell the model to "make sure the math adds up," but a post-condition actually checks it and provides specific feedback when it doesn't.&lt;/p&gt;

&lt;p&gt;I had a ton of fun experimenting with this new project. Try the pattern yourself, the library is available at &lt;a href="https://github.com/strands-labs/ai-functions" rel="noopener noreferrer"&gt;https://github.com/strands-labs/ai-functions&lt;/a&gt;. &lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>aws</category>
      <category>programming</category>
    </item>
    <item>
      <title>From POC to Production-Ready: What Changed in My AI Agent Architecture</title>
      <dc:creator>Morgan Willis</dc:creator>
      <pubDate>Thu, 19 Feb 2026 14:33:08 +0000</pubDate>
      <link>https://forem.com/aws/from-poc-to-production-ready-what-changed-in-my-ai-agent-architecture-3dk7</link>
      <guid>https://forem.com/aws/from-poc-to-production-ready-what-changed-in-my-ai-agent-architecture-3dk7</guid>
      <description>&lt;p&gt;Most AI agent tutorials show the same problematic pattern: a front-end client directly invoking an agent backend. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopiyh41oz4qp73o4x7kt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopiyh41oz4qp73o4x7kt.png" alt="Direct client to agent architecture" width="800" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I wrote a blog, &lt;a href="https://dev.to/morganwilliscloud/we-need-to-talk-about-ai-agent-architectures-4n49"&gt;We Need to Talk about AI Agent Architectures&lt;/a&gt;, that explored why this pattern is a problem and highlighted a few other patterns you should use instead.&lt;/p&gt;

&lt;p&gt;The core argument was straightforward: agents are a capability inside the system, not the system itself. &lt;/p&gt;

&lt;p&gt;The response to that post told me the topic resonated, so I did the next logical thing. I went and built the patterns I shared and created a repo so you can try them out too.&lt;/p&gt;

&lt;p&gt;The reference repository walks through multiple step-by-step iterations, showing how to evolve an agent architecture from a POC to secure and flexible production-ready patterns. &lt;/p&gt;

&lt;p&gt;The repo is here: &lt;a href="https://github.com/aws-samples/sample-ai-agent-architectures-agentcore" rel="noopener noreferrer"&gt;aws-samples/sample-ai-agent-architectures-agentcore&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I also made a video walk-through of the entire solution end-to-end you can watch here:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/jI4AYvvA7ck"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;This post covers what I built, what I learned along the way, and what you should watch out for when building your own agent architectures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting With the Anti-pattern
&lt;/h2&gt;

&lt;p&gt;I started where most people start. Browser talks to agent, and that's it. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkayt06hdj2tfx6qciinp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkayt06hdj2tfx6qciinp.png" alt="Client to Agent Pattern using OAuth" width="634" height="604"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I wrote a simple LangGraph agent that had a few sample tools, and I created a front-end that I could run locally to be able to interact with it.&lt;/p&gt;

&lt;p&gt;I hosted this agent using Amazon Bedrock AgentCore Runtime and used Amazon Cognito to handle auth. &lt;/p&gt;

&lt;p&gt;AgentCore Runtime validates the token before invoking the agent, and the whole thing worked pretty easily. I had my agent hosted in the cloud and only authenticated users could access it. &lt;/p&gt;

&lt;p&gt;But then I started asking the questions I raised in the original post:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What happens when someone hammers this endpoint? &lt;/li&gt;
&lt;li&gt;Where do I enforce rate limits? &lt;/li&gt;
&lt;li&gt;Where does input validation go? &lt;/li&gt;
&lt;li&gt;Where can I put the business logic needed to expand the functionality of this app? &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The answer to all of those questions was: forget about it or shove it in the agent code, because there is simply &lt;em&gt;nowhere else&lt;/em&gt; for these things to be addressed.&lt;/p&gt;

&lt;p&gt;So I started adding in the necessary components to tackle these issues following the patterns I laid out in the original blog.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding a Proper Front Door
&lt;/h2&gt;

&lt;p&gt;With the first iteration of this architecture, I added a proper front door: Amazon API Gateway with AWS Web Application Firewall in front of the agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc3u3p2c7dwmtfo5gpx8w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc3u3p2c7dwmtfo5gpx8w.png" alt="Api Gateway and WAF" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That gave me rate limiting, web traffic filtering, and an Amazon Cognito authorizer at the API Gateway level. The user can authenticate at the API level, and the agent still uses OAuth for inbound authentication.&lt;/p&gt;

&lt;p&gt;This is the first step away from the anti-pattern. &lt;/p&gt;

&lt;p&gt;It felt like a solid improvement, but when a colleague of mine was reviewing my solution, they found a security gap that I think a lot of us would miss. &lt;/p&gt;

&lt;h2&gt;
  
  
  The Authentication Bypass Problem
&lt;/h2&gt;

&lt;p&gt;Let's take a step back.&lt;/p&gt;

&lt;p&gt;The API Gateway uses OAuth to authenticate incoming requests. When a user logs in and invokes the agent, API Gateway verifies the JWT passed in from the client. &lt;/p&gt;

&lt;p&gt;Then, API Gateway turns around and forwards that &lt;em&gt;exact same token&lt;/em&gt; to the agent running on AgentCore Runtime to be validated. One token, used all the way through.&lt;/p&gt;

&lt;p&gt;The problem with this is that the same token that satisfies the API Gateway also satisfies the agent directly. If a user has a valid JWT and knows the AgentCore endpoint, they can bypass the API Gateway entirely. &lt;/p&gt;

&lt;p&gt;Your rate limits, WAF rules, and any other protections you put in front of the agent become optional. A savvy user can just go around them.&lt;/p&gt;

&lt;p&gt;This opens you up to a Denial of Wallet attack. This is where someone floods your system with requests, the serverless hosted backend scales up to absorb those requests, and then you're hit with a fat bill later down the line.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7go2e2gys336bp58uyd3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7go2e2gys336bp58uyd3.png" alt="Api Gateway and WAF Proxy Pattern" width="800" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This might not be an obvious gap at first, because you might think "Well, how would anyone know my agent endpoint? You need both the token AND the endpoint to invoke it. As long as someone doesn't know the endpoint, they'll be forced to go through the API Gateway."&lt;/p&gt;

&lt;p&gt;This is called security through obscurity. You're counting on someone not knowing the endpoint. But sometimes identifiers like ARNs, account numbers, and agent IDs can leak accidentally in various ways.&lt;/p&gt;

&lt;p&gt;It's not enough to operate a production system using security by obscurity as your defense.&lt;/p&gt;

&lt;p&gt;I left this gap in the examples I published in the repo (with disclaimers) deliberately because I think it is the kind of thing teams will hit in practice. &lt;/p&gt;

&lt;h2&gt;
  
  
  Closing the Security Gap
&lt;/h2&gt;

&lt;p&gt;To address this issue, I introduced a lightweight AWS Lambda function between the gateway and the agent and switched the agent to use IAM authentication instead of OAuth.&lt;/p&gt;

&lt;p&gt;That way, the token that is used to authenticate with the API is different from what is being used to securely invoke the agent. A malicious actor can no longer invoke my agent directly.&lt;/p&gt;

&lt;p&gt;Only the AWS Lambda function with the correct permissions attached to its IAM execution role can invoke the agent.&lt;/p&gt;

&lt;p&gt;By separating user authentication from backend invocation permissions, we eliminate the possibility of a client bypassing the API protections.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2h92smvx443c2hw0qsl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft2h92smvx443c2hw0qsl.png" alt="API Gateway + Lambda Pattern" width="800" height="594"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the pattern I recommend as the starting point for most production workloads. &lt;/p&gt;

&lt;p&gt;Cognito handles user identity, API Gateway + WAF handle traffic protection and shaping, Lambda handles request processing, and the agent handles reasoning. &lt;/p&gt;

&lt;p&gt;This represents an application with a single endpoint. Most real-world applications have more than one endpoint.&lt;/p&gt;

&lt;p&gt;Time for the next iteration. &lt;/p&gt;

&lt;h2&gt;
  
  
  Expanding Application Functionality
&lt;/h2&gt;

&lt;p&gt;What I did next is add conversation history to the application: persistent memory for the agent, conversation history displayed on the front-end, and the ability to pick up where you left off across sessions. &lt;/p&gt;

&lt;p&gt;To achieve this, I introduced a second endpoint for &lt;code&gt;conversations&lt;/code&gt; in API Gateway, a second Lambda function for the conversation retrieval logic, an Amazon DynamoDB table for conversation metadata, and I used Amazon Bedrock AgentCore Memory for storing the full conversation history. &lt;/p&gt;

&lt;p&gt;The second endpoint and Lambda function gave me a place to run logic that does not require the agent, like retrieving past conversations from memory to display. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkzgl828jzi3mcyzo1o2z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkzgl828jzi3mcyzo1o2z.png" alt="Conversation history endpoint added to architecture" width="800" height="596"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This reinforces a key design principle: only invoke the LLM when you actually need reasoning, and handle everything else with traditional application infrastructure.&lt;/p&gt;

&lt;p&gt;This is where you can really start to see how to evolve this pattern to adapt to a more complex use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Wasn’t the Real Problem
&lt;/h2&gt;

&lt;p&gt;The agent code barely changed across all iterations. What changed was everything around it. That progression is the whole point of sharing this example. &lt;/p&gt;

&lt;p&gt;As the system needed tighter security, traffic controls, memory, and additional endpoints, the agent stayed focused on what agents do. &lt;/p&gt;

&lt;p&gt;This is why it's so important to design agent architectures applying the same systems design thinking we apply to everything else. It lets you isolate responsibilities, keep reasoning separate from traffic control and business logic, and prevent your agent from becoming an accidental "Big Ball of Mud". &lt;/p&gt;

&lt;p&gt;You want to build an architecture around your agent that can evolve as your requirements evolve.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Comes After the Basics
&lt;/h2&gt;

&lt;p&gt;The patterns we covered here tackle foundational concerns: traffic protection, auth boundaries, separation of responsibilities. These are well-understood problems with well-understood solutions. &lt;/p&gt;

&lt;p&gt;The design challenges we face with deploying AI agents to production that come next are potentially less straightforward.&lt;/p&gt;

&lt;p&gt;For example: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How do you control which tools an agent can call, and under what conditions? &lt;/li&gt;
&lt;li&gt;How do you audit what data the agent accessed and what actions it took at scale? &lt;/li&gt;
&lt;li&gt;How do you prevent the agent from doing something that is perfectly valid in one context but inappropriate in another, while still allowing it when it makes sense?&lt;/li&gt;
&lt;li&gt;How do you ensure your agent is following the instructions you gave it end-to-end?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the kinds of problems teams hit as agents move from basic assistants to systems that take actions on behalf of users and organizations that have real-world consequences. &lt;/p&gt;

&lt;p&gt;The answers require new patterns and solutions that we have not yet fully worked out or adopted widely as an industry. &lt;/p&gt;

&lt;p&gt;This post tackled the basics. You need to get the foundational architecture right first, because none of the harder problems get easier if you are also fighting your own infrastructure design choices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Go Forth and Build Agents
&lt;/h2&gt;

&lt;p&gt;In my original post, I argued that agents are a capability inside the system, not the system itself. Building these patterns reinforced that. &lt;/p&gt;

&lt;p&gt;Every iteration made the agent more useful, more secure, and more operable, not by changing the agent, but by building the right architecture around it. &lt;/p&gt;

&lt;p&gt;Good architecture makes your agent better without the agent needing to know about it.&lt;/p&gt;

&lt;p&gt;Go fork &lt;a href="https://github.com/aws-samples/sample-ai-agent-architectures-agentcore" rel="noopener noreferrer"&gt;the repo&lt;/a&gt;, deploy the iterations, and adapt the patterns to your own use cases.&lt;/p&gt;

&lt;p&gt;If you found this useful, star the repo so others can find it too. And if you want more context on why these patterns matter, start with the original post: &lt;a href="https://dev.to/morganwilliscloud/we-need-to-talk-about-ai-agent-architectures-4n49"&gt;We Need To Talk About AI Agent Architectures&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>agents</category>
      <category>aws</category>
    </item>
    <item>
      <title>Deploying AI Agents on AWS Without Creating a Security Mess</title>
      <dc:creator>Morgan Willis</dc:creator>
      <pubDate>Mon, 12 Jan 2026 20:26:01 +0000</pubDate>
      <link>https://forem.com/aws/deploying-ai-agents-on-aws-without-creating-a-security-mess-4i</link>
      <guid>https://forem.com/aws/deploying-ai-agents-on-aws-without-creating-a-security-mess-4i</guid>
      <description>&lt;p&gt;Most agents that are useful need access to private data.&lt;/p&gt;

&lt;p&gt;They need to query internal databases, call internal systems, or read data that was never intended to be public. These requirements immediately raise questions about network exposure, credential handling, and compliance.&lt;/p&gt;

&lt;p&gt;How does the agent connect to a private database? Where does it run? How do you handle multiple users without sharing execution state? How do you grant access to private systems without hardcoding credentials or widening network access?&lt;/p&gt;

&lt;p&gt;This post walks through an example of how I answered those questions for an agent I deployed to AWS.&lt;/p&gt;

&lt;p&gt;You can find the full-length video where I build this solution end-to-end  &lt;a href="https://www.youtube.com/watch?v=Q-tYIAuv9WI" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;br&gt;
  &lt;iframe src="https://www.youtube.com/embed/Q-tYIAuv9WI"&gt;
  &lt;/iframe&gt;
 &lt;/p&gt;
&lt;h2&gt;
  
  
  The running example
&lt;/h2&gt;

&lt;p&gt;I built a simple logistics helper agent using &lt;a href="https://strandsagents.com/latest/?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt; and an OpenAI model. It answers questions about shipments by querying a live PostgreSQL database running on &lt;a href="https://aws.amazon.com/rds/?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Amazon Relational Database Service (RDS)&lt;/a&gt; inside an &lt;a href="https://aws.amazon.com/vpc/?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Amazon Virtual Private Cloud (VPC)&lt;/a&gt; on AWS.&lt;/p&gt;

&lt;p&gt;The easy part was building the agent logic. I got it running locally using mocked tools for early testing.&lt;/p&gt;

&lt;p&gt;The hard part was deploying the agent in a way that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;does not expose the database publicly&lt;/li&gt;
&lt;li&gt;does not embed credentials in code&lt;/li&gt;
&lt;li&gt;does not punch unnecessary holes in the network&lt;/li&gt;
&lt;li&gt;properly isolates user sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS provides the building blocks to solve these problems, but you still need to make deliberate choices about how they fit together.&lt;/p&gt;

&lt;p&gt;This post uses the logistics agent as a running example. Each snippet is either from the agent code or the infrastructure files that deploy it.&lt;/p&gt;


&lt;h2&gt;
  
  
  Amazon Bedrock AgentCore primer
&lt;/h2&gt;

&lt;p&gt;In this example, AgentCore Runtime is the hosting environment for the logistics agent.&lt;/p&gt;

&lt;p&gt;AgentCore Runtime is a managed, serverless, hosted runtime that runs agents in isolated sessions, handles authentication, scaling, and lifecycle management without requiring you to completely rewrite your agent for integration.  It is framework and model agnostic, and supports multiple protocols including: HTTP, MCP, and A2A. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftivtrrlwodn8bz5fvtgn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftivtrrlwodn8bz5fvtgn.png" alt="Amazon Bedrock AgentCore Runtime Overview" width="800" height="264"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can read more about Amazon Bedrock AgentCore Runtime &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agents-tools-runtime.html?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  The architecture at a glance
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87qja2ptzkzzf4fkdvnq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87qja2ptzkzzf4fkdvnq.png" alt="Architecture Diagram for example" width="800" height="602"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The diagram above shows the architecture for the backend of the logistics agent, including how it connects to the private database and external model provider, OpenAI.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent runs on Amazon Bedrock AgentCore Runtime.&lt;/li&gt;
&lt;li&gt;The AgentCore Runtime deploys Elastic Network Interfaces (ENIs) into private subnets inside a VPC to allow connectivity with private resources.&lt;/li&gt;
&lt;li&gt;The database runs on a private RDS instance in the same VPC.&lt;/li&gt;
&lt;li&gt;The agent reads database connection information from AWS Systems Manager Parameter Store.&lt;/li&gt;
&lt;li&gt;The agent reads secrets from AWS Secrets Manager (database credentials and the OpenAI key for the model provider).&lt;/li&gt;
&lt;li&gt;VPC endpoints keep calls to AWS services on the AWS network, including calls to AgentCore, AWS Systems Manager, and AWS Secrets Manager.&lt;/li&gt;
&lt;li&gt;A NAT Gateway provides outbound internet access so the agent can call OpenAI for inference.&lt;/li&gt;
&lt;li&gt;IAM controls:

&lt;ul&gt;
&lt;li&gt;who can invoke the agent&lt;/li&gt;
&lt;li&gt;what AWS APIs the agent can call once invoked&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to see the full code or AWS Cloud Development Kit (CDK) stack, the step by step guide can be found on GitHub &lt;a href="https://github.com/aws-samples/sample-logistics-agent-agentcore-runtime/tree/main" rel="noopener noreferrer"&gt;here&lt;/a&gt;. &lt;/p&gt;
&lt;h2&gt;
  
  
  A quick map of the security concerns and supporting AWS features
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Security concern&lt;/th&gt;
&lt;th&gt;AWS primitive&lt;/th&gt;
&lt;th&gt;Where it shows up in this example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Inbound authentication for invocations&lt;/td&gt;
&lt;td&gt;AgentCore Runtime support for IAM SigV4 or OAuth (JWT)&lt;/td&gt;
&lt;td&gt;The caller invoking &lt;code&gt;InvokeAgentRuntime&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session isolation&lt;/td&gt;
&lt;td&gt;AgentCore Runtime sessions&lt;/td&gt;
&lt;td&gt;Runtime behavior (no shared process across users)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Secrets&lt;/td&gt;
&lt;td&gt;AWS Secrets Manager&lt;/td&gt;
&lt;td&gt;Agent loads DB credentials and OpenAI key at runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Non-secret config&lt;/td&gt;
&lt;td&gt;AWS SSM Parameter Store&lt;/td&gt;
&lt;td&gt;Agent loads endpoint, DB name, and secret ARNs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentCore Runtime agent permissions&lt;/td&gt;
&lt;td&gt;IAM execution role&lt;/td&gt;
&lt;td&gt;Role associated to the agent in AgentCore Runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Private connectivity to AWS services&lt;/td&gt;
&lt;td&gt;VPC endpoints&lt;/td&gt;
&lt;td&gt;VPC endpoints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Private connectivity to RDS&lt;/td&gt;
&lt;td&gt;VPC networking and security groups&lt;/td&gt;
&lt;td&gt;Runtime ENIs in private subnets and security group rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Egress only internet access from private subnets&lt;/td&gt;
&lt;td&gt;NAT gateway&lt;/td&gt;
&lt;td&gt;NAT Gateway in a public subnet with private subnet route tables for 0.0.0.0/0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Keep this table in mind, as this post will dive deeper into each row.&lt;/p&gt;


&lt;h2&gt;
  
  
  The agent code, trimmed to the parts that matter
&lt;/h2&gt;

&lt;p&gt;This is the logistics helper agent written in Python using &lt;a href="https://strandsagents.com/latest/?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt;, with some details removed for brevity. The full sample can be found &lt;a href="https://github.com/aws-samples/sample-logistics-agent-agentcore-runtime/tree/main" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pg8000.native&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.models.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockAgentCoreApp&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockAgentCoreApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Cached within a single runtime session
&lt;/span&gt;&lt;span class="n"&gt;_db_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="n"&gt;_db_credentials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="n"&gt;_db_connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_load_db_config&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_db_connection&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_shipment_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reference_no&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_delayed_shipments&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a logistics tracking assistant with access to a real-time shipment database.
&lt;/span&gt;&lt;span class="gp"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="n"&gt;_openai_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_openai_model&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_initialize_agent&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logistics_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;user_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please provide a query in the format: {&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;your question here&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_initialize_agent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  How to make the agent AgentCore Runtime compatible
&lt;/h3&gt;

&lt;p&gt;Before we dive into the details about what the specific features in AWS are being used for security in this example, let’s first review how to make an agent AgentCore Runtime compatible. In your agent file, the code needed for integration is minimal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockAgentCoreApp&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockAgentCoreApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logistics_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;@app.entrypoint&lt;/code&gt; is a decorator for the handler or entrypoint for your agent. AgentCore Runtime will call that function with a payload whenever an invocation hits the agent.&lt;/p&gt;

&lt;p&gt;Behind the scenes, this is implementing the AgentCore Runtime service contract for HTTP which you can read more about &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-http-protocol-contract.html?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The important part is that it will implement the &lt;code&gt;/invocations&lt;/code&gt;  endpoint on port &lt;code&gt;8080&lt;/code&gt; which allows us to invoke the agent once it’s deployed to runtime.&lt;/p&gt;

&lt;p&gt;This example is for an agent built using Strands Agents SDK, you can find code snippets supporting other frameworks &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/using-any-agent-framework.html?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Deploying the agent to AgentCore Runtime
&lt;/h3&gt;

&lt;p&gt;Once the agent is wired up, you have options for deployment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AgentCore CLI starter toolkit&lt;/strong&gt;: fast iteration, good for development and early testing. You can use the command line to run &lt;code&gt;agentcore configure&lt;/code&gt; to configure your agent, then &lt;code&gt;agentcore deploy&lt;/code&gt; to deploy it to runtime. Read more about this &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-get-started-toolkit.html?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure as code&lt;/strong&gt; (AWS CloudFormation or AWS CDK): best for production deployments. You can find the AgentCore Construct Library for AWS CDK&lt;a href="https://docs.aws.amazon.com/cdk/api/v2/python/aws_cdk.aws_bedrock_agentcore_alpha/README.html?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’ll be using snippets from the AWS CDK template I created to deploy the agent in the following sections.&lt;/p&gt;




&lt;h3&gt;
  
  
  Inbound authentication and authorization
&lt;/h3&gt;

&lt;p&gt;For the logistics agent, the first security boundary is deciding who is allowed to invoke the agent.&lt;/p&gt;

&lt;p&gt;That means you need an inbound authentication mechanism, and I don’t know about you, but I am not rolling my own auth.&lt;/p&gt;

&lt;p&gt;AgentCore Runtime supports two inbound authentication options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;IAM (SigV4)&lt;/strong&gt;: the caller signs the request with AWS credentials. An IAM policy on the caller determines whether they’re allowed to invoke the agent runtime, the same way authorization works for other AWS APIs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OAuth 2.0 (JWT bearer tokens)&lt;/strong&gt;: the caller authenticates with an identity provider and sends a JWT bearer token. The agent runtime validates that token (via your configured IdP).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The logistics helper agent uses IAM for inbound authentication. When the agent is invoked, AgentCore Runtime validates the incoming request. There is no code related to authentication in the actual agent itself. AgentCore Runtime handles that for you.&lt;/p&gt;

&lt;h3&gt;
  
  
  What IAM permissions look like for the invoker
&lt;/h3&gt;

&lt;p&gt;When invoking the agent, the invoker needs permission to call the &lt;code&gt;bedrock-agentcore:InvokeAgentRuntime&lt;/code&gt; API on the runtime ARN.&lt;/p&gt;

&lt;p&gt;Example policy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Sid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AllowInvokeAgentRuntime"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bedrock-agentcore:InvokeAgentRuntime"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:bedrock-agentcore:us-east-1:123456789012:runtime/logistics_agent"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important distinction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This is the invoker’s permission (who can call the agent).&lt;/li&gt;
&lt;li&gt;Later we’ll define the agent IAM execution role (what the agent can do once it starts running).&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;NOTE: In this example, I’m invoking the runtime directly using local IAM credentials, the AgentCore CLI, and the AWS SDK as a proof of concept. In a real-world system, I would place an API hosted using a service like Amazon API Gateway in front of the agent as a proxy. API Gateway would handle end-user authentication and request validation, then it can use its own IAM role to call the &lt;code&gt;InvokeAgentRuntime&lt;/code&gt; API for the agent. I wrote another blog about why you should have at least a proxy component sitting in front of your agents &lt;a href="https://dev.to/morganwilliscloud/we-need-to-talk-about-ai-agent-architectures-4n49"&gt;here&lt;/a&gt;. &lt;/p&gt;


&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Isolating users and execution state
&lt;/h3&gt;

&lt;p&gt;If multiple users hit the same agent at the same time, you don't want a process with data and state shared across users. You also don't need to reinvent the wheel by creating a multi-tenant isolation mechanism yourself.&lt;/p&gt;

&lt;p&gt;AgentCore Runtime runs agents in isolated environments, called sessions. &lt;/p&gt;

&lt;p&gt;Each time someone invokes your agent, AgentCore either creates a new session or routes the request to an existing session (if you supply a session ID). Agent sessions run in a dedicated microVM with isolated CPU, memory, and filesystem resources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi97a4f43felx1smu5cjd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi97a4f43felx1smu5cjd.png" alt="AgentCore Runtime Session Overview" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Storing and accessing credentials and connection details securely
&lt;/h3&gt;

&lt;p&gt;Now that we know how to invoke the agent and how the agent runs in isolated sessions, the next question is: how does the logistics agent gain access to private systems without baking sensitive data into the code or environment variables?&lt;/p&gt;

&lt;p&gt;There are two different kinds of data the agent needs in order to connect to the Amazon RDS database and OpenAI model:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Secrets&lt;/strong&gt;, like database credentials and the OpenAI API keys&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configuration&lt;/strong&gt;, like hostnames, database names, and the ARNs of the secrets themselves so they can be retrieved programmatically &lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  What to store where
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You should use &lt;strong&gt;AWS Secrets Manager&lt;/strong&gt; to store sensitive values like:

&lt;ul&gt;
&lt;li&gt;The RDS username and password&lt;/li&gt;
&lt;li&gt;The OpenAI API key&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;You should use &lt;strong&gt;AWS Systems Manager Parameter Store&lt;/strong&gt; to store non-secret configuration data like:

&lt;ul&gt;
&lt;li&gt;The RDS endpoint&lt;/li&gt;
&lt;li&gt;The database name&lt;/li&gt;
&lt;li&gt;The ARNs of the secrets that contain credentials&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;This split gives you a few practical benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Secrets can be rotated independently&lt;/li&gt;
&lt;li&gt;You can audit secret access&lt;/li&gt;
&lt;li&gt;You avoid the temptation to pass credentials around “just to make it work”&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Configuration lookup using AWS SSM Parameter Store
&lt;/h3&gt;

&lt;p&gt;This code snippet allows the agent to read three parameters from AWS SSM Parameter Store:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The RDS endpoint&lt;/li&gt;
&lt;li&gt;The database name&lt;/li&gt;
&lt;li&gt;The secret ARN for the DB credentials&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example Python code using boto3 to access AWS SSM Parameter Store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ssm_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ssm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ssm_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_parameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/agentcore/rds/endpoint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/agentcore/rds/database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/agentcore/rds/secret-arn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;span class="n"&gt;_db_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;endpoint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/agentcore/rds/endpoint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/agentcore/rds/database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secret_arn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/agentcore/rds/secret-arn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern keeps configuration out of code and out of deployment artifacts, and access can be tightly scoped to only the specific parameter paths the agent needs.&lt;/p&gt;

&lt;p&gt;From a security standpoint, this also creates a clear separation of responsibilities: Parameter Store answers where the database is and which secret to use to connect, while Secrets Manager controls what the credentials actually are. &lt;/p&gt;

&lt;p&gt;If configuration details need to change, you update it centrally without redeploying code, and if access needs to be revoked or audited, it’s handled through IAM rather than application logic.&lt;/p&gt;

&lt;p&gt;This keeps configuration flexible, secrets isolated, and permissions explicit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fetching credentials from AWS Secrets Manager
&lt;/h3&gt;

&lt;p&gt;Once the logistics agent knows which secret to retrieve, it fetches the credentials from AWS Secrets Manager.&lt;/p&gt;

&lt;p&gt;Example Python code using boto3 to access AWS Secrets Manager:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;secrets_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;secretsmanager&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;secret_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;secrets_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_secret_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;SecretId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_db_config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;secret_arn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;_db_credentials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;secret_response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SecretString&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same pattern is used to retrieve the OpenAI API key. The agent never reads secrets from disk, environment variables, or configuration files. Everything comes from managed services at runtime. &lt;/p&gt;

&lt;p&gt;When you're working with secrets in code, be careful not to log the secret payload or connection strings; treat exceptions as potentially sensitive and sanitize logs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Granting the agent permission to call AWS APIs
&lt;/h2&gt;

&lt;p&gt;Every AWS API call the logistics agent makes (AWS SSM, AWS Secrets Manager, Amazon CloudWatch) is authorized through the IAM execution role attached to the agent runtime.&lt;/p&gt;

&lt;p&gt;This is distinct from the invoker permissions described earlier. It defines what the runtime can do after an invocation starts.&lt;/p&gt;

&lt;p&gt;In this example, the role itself was created using the AWS CDK, and that role is assumed by the AgentCore Runtime service principal and granted least-privilege access to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read specific SSM parameters&lt;/li&gt;
&lt;li&gt;read specific AWS Secrets Manager secrets&lt;/li&gt;
&lt;li&gt;write logs/traces/metrics to Amazon CloudWatch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example code snippet from the AWS CDK stack that defines the IAM permissions for the parameters and secrets the agent needs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;runtime_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_to_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PolicyStatement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ssm:GetParameter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ssm:GetParameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;resources&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:ssm:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:parameter/agentcore/rds/endpoint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:ssm:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:parameter/agentcore/rds/database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:ssm:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:parameter/agentcore/rds/secret-arn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;runtime_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_to_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PolicyStatement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secretsmanager:GetSecretValue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;resources&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;db_secret_arn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;openai_secret_arn&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the agent tries to fetch a secret it is not allowed to read, it fails. &lt;/p&gt;

&lt;p&gt;Additionally, the AgentCore Runtime IAM execution role is the only principal allowed to read these secrets, scoped to specific secret ARNs, and secret encryption is handled by Secrets Manager (optionally with a customer-managed KMS key if you need tighter controls).&lt;/p&gt;




&lt;h3&gt;
  
  
  Allowing the agent to access private resources inside an Amazon VPC
&lt;/h3&gt;

&lt;p&gt;Because the logistics agent queries a private RDS instance, the runtime itself should run inside the same VPC.&lt;/p&gt;

&lt;p&gt;To achieve this, the agent runtime is deployed using the VPC network mode configuration. &lt;/p&gt;

&lt;h3&gt;
  
  
  The AgentCore Runtime VPC network mode configuration
&lt;/h3&gt;

&lt;p&gt;By default, AgentCore Runtime does not deploy agents to a VPC. Deploying agents to a VPC using VPC network mode enables you to have an agent that connects to other resources within that VPC without opening up any network security holes. This makes it easier to allow your agent to work with private databases, call internal APIs, or integrate with other existing systems running in a VPC.&lt;/p&gt;

&lt;p&gt;Example code snippet from the AWS CDK stack that defines the AgentCore Runtime resource using VPC network mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CfnResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AgentCoreRuntime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS::BedrockAgentCore::Runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AgentRuntimeName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logistics_agent_cdk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Runtime for logistics Strands agent with RDS backed tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RoleArn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;runtime_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role_arn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NetworkConfiguration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NetworkMode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VPC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NetworkModeConfig&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Subnets&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;private_subnet_ids&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SecurityGroups&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;runtime_sg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;security_group_id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AgentRuntimeArtifact&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CodeConfiguration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;S3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;asset_bucket_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Prefix&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;asset_object_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="p"&gt;}&lt;/span&gt;
                        &lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EntryPoint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PYTHON_3_12&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When an agent is invoked with VPC network mode configured, elastic network interfaces, or ENIs, are created in the configured private subnets. This gives each runtime session private IP addresses and allows it to connect to resources inside the VPC, like the logistics RDS database, over internal VPC networking. &lt;/p&gt;

&lt;h3&gt;
  
  
  VPC endpoints for accessing AWS services
&lt;/h3&gt;

&lt;p&gt;Once the runtime is configured to run inside private subnets, the next issue pops up: the logistics agent still needs to call AWS APIs to work.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SSM and Secrets Manager are AWS services.&lt;/li&gt;
&lt;li&gt;AgentCore itself is an AWS service.&lt;/li&gt;
&lt;li&gt;CloudWatch Logs is an AWS service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without VPC endpoints, these API calls would typically route through public AWS service endpoints via a NAT gateway and traverse the public internet. In many environments, that pattern is not acceptable for compliance or security reasons.&lt;/p&gt;

&lt;p&gt;VPC endpoints allow those calls to stay entirely on the AWS private network. By deploying endpoints for services like AWS Systems Manager Parameter Store, AWS Secrets Manager, Amazon Bedrock AgentCore Runtime, and Amazon CloudWatch, API traffic is routed privately within the VPC, reducing reliance on NAT gateways and eliminating exposure to the public internet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Restricting database access to only the agent runtime
&lt;/h2&gt;

&lt;p&gt;The VPC connectivity feature puts the agent in the right network. It does not actually allow the agent to communicate with the RDS database. That comes from security groups.&lt;/p&gt;

&lt;p&gt;In this setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent has a security group.&lt;/li&gt;
&lt;li&gt;The RDS instance has a security group.&lt;/li&gt;
&lt;li&gt;The RDS security group allows inbound PostgreSQL traffic only from the agent security group.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This has one security group referencing another (sometimes called security group chaining). Security group chaining makes it so that you don’t have to allow a CIDR range or open access within the VPC.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the RDS security group rule should look like conceptually
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Inbound:

&lt;ul&gt;
&lt;li&gt;Protocol: TCP&lt;/li&gt;
&lt;li&gt;Port: 5432 (Postgres)&lt;/li&gt;
&lt;li&gt;Source: runtime security group ID&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;By default, security groups allow all outbound network traffic. You can also restrict the allowed outbound traffic to only what’s required (RDS port, VPC endpoints, and approved egress).&lt;/p&gt;

&lt;p&gt;This example also assumes TLS is enforced for the RDS connection so database traffic is encrypted in transit.&lt;/p&gt;

&lt;h3&gt;
  
  
  A note on database access and query scope
&lt;/h3&gt;

&lt;p&gt;A key design choice in this example that has not been covered yet is that the logistics agent never generates SQL directly. &lt;/p&gt;

&lt;p&gt;It can only invoke prewritten tools that execute parameterized queries defined in code. This design avoids letting the model construct arbitrary queries against the database, which introduces risks ranging from accidental data exposure to destructive operations.&lt;/p&gt;

&lt;p&gt;The agent can choose which tool to call and which parameters to supply, but it cannot change the shape of the query, the tables involved, or the operations being performed. That keeps the database interaction predictable and reviewable, even as agent behavior evolves.&lt;/p&gt;

&lt;p&gt;Even with tool-based access, the database credentials used by the agent are scoped to read-only access on the required schema and views. Database permissions remain the final layer of protection if a tool is misconfigured, expanded later, or reused in ways that were not originally anticipated. It's important to have a layered approach to security.&lt;/p&gt;




&lt;h3&gt;
  
  
  Allowing outbound internet access from a private subnet
&lt;/h3&gt;

&lt;p&gt;At this point, the logistics agent can talk to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RDS privately inside the VPC&lt;/li&gt;
&lt;li&gt;AWS services privately through VPC endpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the OpenAI model is not an AWS service. So, the agent also needs outbound internet access.&lt;/p&gt;

&lt;p&gt;Because the agent runs in private subnets, it cannot reach the internet directly. &lt;/p&gt;

&lt;p&gt;The pattern that allows egress only traffic is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A NAT gateway or NAT instance deployed to a public subnet&lt;/li&gt;
&lt;li&gt;Private subnet route tables to direct internet bound traffic (&lt;code&gt;0.0.0.0/0&lt;/code&gt;) to the NAT gateway&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives you controlled egress without giving the runtime a public IP. This also keeps your AWS service calls private through VPC endpoints while still enabling external calls for OpenAI model invocation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting all the pieces together
&lt;/h2&gt;

&lt;p&gt;By the time everything is wired up, the security model is pretty straightforward. &lt;/p&gt;

&lt;p&gt;This post has focused on the foundational security and deployment mechanics required to run an agent against private systems. It does not represent a complete architecture and intentionally does not cover application-level authorization, data encryption, fine-grained data access controls, model-specific safety techniques, or cost optimization strategies, all of which depend heavily on the specific use case. Those pieces build on top of the patterns shown here rather than replacing them.&lt;/p&gt;

&lt;p&gt;Within this scope, the security model comes down to a few clear responsibilities. None of these controls are unique to agents, but skipping them is how agent deployments turn into security problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Lock down who can invoke the agent
&lt;/h3&gt;

&lt;p&gt;Inbound access is handled by AgentCore Runtime.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you use IAM, invokers need &lt;code&gt;bedrock-agentcore:InvokeAgentRuntime&lt;/code&gt; permission on the runtime ARN.&lt;/li&gt;
&lt;li&gt;If you use OAuth, your callers authenticate with your IdP and present a JWT, and the runtime validates it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Either way, you have a clear, externalized answer to “who can call this thing.”&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Do not share execution state across users
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;AgentCore Runtime sessions give you per-session isolation. &lt;/li&gt;
&lt;li&gt;Your agent isn’t running as one long-lived server process that every user shares.&lt;/li&gt;
&lt;li&gt;Within a session, you can cache data. Across sessions, state is isolated. &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3) Authorize agents to make AWS API calls via an IAM execution role
&lt;/h3&gt;

&lt;p&gt;Once invoked, the runtime assumes an IAM role that defines exactly what AWS API calls it can make:&lt;/p&gt;

&lt;p&gt;No static credentials needed and if the role doesn’t allow it, the agent can’t do it.&lt;/p&gt;

&lt;h3&gt;
  
  
  4) Allow secure access to private resources inside a VPC
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use VPC Network Mode in your AgentCore Runtime configuration&lt;/li&gt;
&lt;li&gt;AgentCore Runtime deploys ENIs to selected private subnets&lt;/li&gt;
&lt;li&gt;Use VPC endpoints for communication with AgentCore and other AWS services&lt;/li&gt;
&lt;li&gt;VPC endpoints keep AWS service traffic on the private AWS network&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5) Only allow appropriate database access from the agent
&lt;/h3&gt;

&lt;p&gt;Security groups provide an instance level firewall:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RDS inbound traffic is limited to traffic coming from the runtime security group on the necessary database port&lt;/li&gt;
&lt;li&gt;no CIDR-based broad rules&lt;/li&gt;
&lt;li&gt;no “anything in the VPC can connect”&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6) Provide narrow egress internet access for OpenAI
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;NAT gateway gives the runtime outbound access from private subnets&lt;/li&gt;
&lt;li&gt;Route tables send internet-bound traffic to NAT&lt;/li&gt;
&lt;li&gt;AWS service calls can still stay private via VPC endpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s, at a minimum, what it takes to deploy an agent that accesses private systems without creating a security mess.&lt;/p&gt;

&lt;p&gt;If you want to follow this example or adapt this architecture, the &lt;a href="https://github.com/aws-samples/sample-logistics-agent-agentcore-runtime/tree/main" rel="noopener noreferrer"&gt;full repo&lt;/a&gt; includes the infrastructure and the deployment steps, plus cleanup.&lt;/p&gt;

&lt;p&gt;And check out the video where I walk you through building the whole solution end-to-end &lt;a href="https://www.youtube.com/watch?v=Q-tYIAuv9WI" rel="noopener noreferrer"&gt;here&lt;/a&gt;. &lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>aws</category>
      <category>security</category>
    </item>
    <item>
      <title>We Need To Talk About AI Agent Architectures</title>
      <dc:creator>Morgan Willis</dc:creator>
      <pubDate>Mon, 08 Dec 2025 23:00:17 +0000</pubDate>
      <link>https://forem.com/aws/we-need-to-talk-about-ai-agent-architectures-4n49</link>
      <guid>https://forem.com/aws/we-need-to-talk-about-ai-agent-architectures-4n49</guid>
      <description>&lt;p&gt;&lt;strong&gt;AI agents are getting easier to build and host.&lt;/strong&gt; With agentic frameworks and cloud-based hosting environments, you can deploy an agent to the cloud in an afternoon. It is now possible to assemble a multi-agent setup with memory, observability, and MCP connected tools without a huge amount of code or infrastructure work.&lt;/p&gt;

&lt;p&gt;This convenience, paired with AI coding assistants making it easier than ever to ship, has created a trend that is worth talking about. Many developers are wiring UIs directly to their agents as if the agent runtime &lt;em&gt;is&lt;/em&gt; the entire backend. It looks clean. It feels efficient. It also happens to be what most demos show, so it is understandable that teams take that pattern and run.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnicshblhmr1xb9vfo31t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnicshblhmr1xb9vfo31t.png" alt="The diagram illustrates a direct client→agent architecture with a single entrypoint where the agent runtime replaces the entire backend." width="800" height="435"&gt;&lt;/a&gt;&lt;br&gt;
The diagram illustrates a direct client→agent architecture with a single entrypoint where the agent runtime replaces the entire backend.&lt;/p&gt;

&lt;p&gt;This works well when you are exploring ideas. Once you move beyond a demo and into a real application, that client to agent pattern may start to break down. This is not because any specific agent runtime itself is limited, but because real production systems still need the same architectural layers they have always needed.&lt;/p&gt;

&lt;p&gt;Web applications still need input sanitization. APIs still need rate limits. Business logic still needs a home. Services still need to coordinate with other systems. As soon as those pieces enter the picture, the architecture starts to look a lot more familiar.&lt;/p&gt;

&lt;p&gt;AI agents expand what an application can do, but they do not erase the fundamentals of good systems design. The agent itself is not the system. It is a capability inside the system.&lt;/p&gt;

&lt;p&gt;Let’s talk about what that means and why it matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why Direct Client → Agent Is an Incomplete Architecture&lt;/strong&gt;
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Terminology note:&lt;/strong&gt; In this post, &lt;em&gt;runtime&lt;/em&gt; refers to a managed environment that executes agent logic on the server side. I use Amazon Bedrock AgentCore Runtime as an example throughout, but the same concepts apply to other hosted environments. &lt;em&gt;Agent&lt;/em&gt; or &lt;em&gt;agent service&lt;/em&gt; is your deployed code containing the agent framework, prompts, and tool integration.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Upstream services or modules&lt;/em&gt; are all components that handle requests before they reach the agent (UI, gateways, routers, backends). &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Downstream services or modules&lt;/em&gt; are the tools and resources the agent calls (MCP tools, APIs, databases, internal services).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When the client talks directly to the agent runtime, responsibilities that normally live in other components can either get lost entirely or end up pushed into your agent code where they do not belong.&lt;/p&gt;

&lt;p&gt;Without typical components of web architectures, the agent is expected to handle: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Request and security boundaries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Input sanitization, API level authorization rules, web traffic filtering, rate limiting, throttling, and safety checks. &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Application and system orchestration&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Coordinating services, enforcing business rules that span multiple systems, and managing workflow transitions that require durability outside an agent session. &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Resilience and operational concerns&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Retries, backoff behavior, event buffering, and behaviors that protect downstream systems. &lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Agents or hosted runtimes may be able to handle some of these tasks, but they were never designed to be your entire backend, your middle tier, or your web server. This is the same reason why we don’t point clients directly at AWS Lambda functions in most production systems without protective layers. In a similar way, agents are not meant to be directly front-end facing services for most use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Architecture Breaks Down in Practice
&lt;/h2&gt;

&lt;p&gt;Here are 3 ways the client→agent pattern can break down in production: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Traffic, cost, and load patterns become hard to control.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When the UI talks directly to a single agent service without upstream boundaries, there is no clean place to enforce rate limits, handle noisy clients, or cap usage per user. A small bug, a retry loop, or a surge in usage can translate into a flood of LLM calls, driving unpredictable latency and inference costs without a structured way to throttle or shed load.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Every change shares the same blast radius because everything ships in one deployment unit&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When validation logic, business rules, integration code, and agent behavior all live in the same service, every small change requires touching and redeploying the entire agent app. A tweak to a business rule, a simple bug fix, or a prompt change all share the same blast radius and rollback path, which slows iteration and makes failures harder to localize. &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Refactoring becomes brittle as the system grows.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;When the agent service acts as the entire backend, every aspect is fused into a single deployment unit. Additionally, many agent runtimes expose a single entrypoint like &lt;code&gt;POST /invoke&lt;/code&gt;, which means every feature, workflow, and behavior enters through one undifferentiated entrypoint. &lt;/p&gt;

&lt;p&gt;Nothing distinguishes one operation from another, so you lose the natural places where you would normally enforce permissions, validate input, or apply business rules. &lt;/p&gt;

&lt;p&gt;With this setup, extending the architecture becomes difficult. Adding new functionality, queues, or workflow orchestration later means untangling tightly coupled logic. Adding features risks rewriting the agent, because the system never developed the separation needed to evolve cleanly.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why Separation of Concerns Still Matters
&lt;/h2&gt;

&lt;p&gt;We break systems into modules because each piece handles a specific kind of complexity so the rest of the system doesn’t have to. That separation of concerns keeps responsibilities contained, avoids logic leaking across boundaries, allows for decoupling, and makes the system more predictable at scale.&lt;/p&gt;

&lt;p&gt;Testability also suffers when everything runs inside a single boundary. Isolating components, mocking dependencies, and doing targeted regression testing is far easier when concerns are separated into clear modules.&lt;/p&gt;

&lt;p&gt;Experienced developers and systems engineers know this intuitively, but the rapid progress in AI tooling has lowered the barrier to agent deployment in a way that lets people ship agents before they have the architectural context to support them. &lt;/p&gt;

&lt;p&gt;We, as a technical community, should amplify real-world patterns and lessons learned. Providing more examples of advanced use cases alongside simplified tutorials will allow us to learn together and move towards a set of guidelines for well-architected agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Balancing Simplicity and Structure in Agentic Systems
&lt;/h2&gt;

&lt;p&gt;Just like any other solution, as you introduce more moving parts, you are now responsible for operating and maintaining them.&lt;/p&gt;

&lt;p&gt;Additional components add complexity in the same way that having a load balancer, an API gateway, and a database connection pool adds complexity. These components exist because they absorb or abstract the handling of specific categories of risk, or responsibility, so your core application code does not have to. They make the entire system more reliable.&lt;/p&gt;

&lt;p&gt;None of this means you must build a massive, highly distributed, micro-serviced architecture to use agents correctly. &lt;/p&gt;

&lt;p&gt;You can run a simple, clean setup with a load balancer and router component in front of your agent, or add an API gateway for basic shaping and protection, and stop there. That pattern is perfectly valid for many teams, especially early on.&lt;/p&gt;

&lt;p&gt;At the same time, companies operating at global scale or projects with complex requirements will naturally need more components. They may introduce additional services for orchestration, workflow durability, message buffering, network connectivity, or cross-system coordination. These architectures are more complex because the requirements and traffic patterns call for that complexity.&lt;/p&gt;

&lt;p&gt;Both ends of that spectrum are reasonable. What matters is choosing the right architecture for your use case and constraints. The goal is not to chase complexity for complexity’s sake, and it is also not to flatten everything into a single module. It is to introduce the minimum number of components that meaningfully reduce risk, improve security, and enable flexibility as your system grows and changes.&lt;/p&gt;

&lt;p&gt;That balance is what helps you start simple without boxing yourself into a corner later.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What Belongs in the Agent vs the Backend?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;With all of that being said, what &lt;em&gt;does&lt;/em&gt; belong in the agent vs in other components?&lt;/p&gt;

&lt;p&gt;Agent frameworks make it easy to blur these boundaries, but keeping them clear is what prevents the system from collapsing into an expensive mess. The way you decide to build your agent heavily depends on your use case and technology choices. Agentic frameworks vary in their implementation, and so do the requirements from case to case. There is no one size fits all answer. Here are some high-level guidelines for getting started.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What typically belongs upstream (UI, gateway, router, backend)&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Input shaping, validation, rate limiting, and web traffic filtering&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Core business logic&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Coordinating between services or orchestrating complex workflows&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Workflow state, retries, orchestration, and durability&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Separating these concerns keeps security, validation, and business rules separate from core agent code, reduces the blast radius of changes, and lets you change agent behavior without constantly reworking the logic that keeps the system running on a basic level.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What typically belongs inside the agent&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Invoking LLMs using agentic frameworks&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool selection and orchestration logic&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent session state, context, and memory handling&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An agent is generally responsible for interpreting goals, choosing actions, and reasoning over context. Decisions might come from the model, from graph level orchestration, or from deterministic routing depending on the framework and use case.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What typically belongs in tools&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Reading or writing data&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Querying systems of record&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Triggering deterministic code&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Invoking internal or external APIs&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Triggering another agent to do work&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tools encapsulate actions. The model may determine when the tools are needed and the tools control &lt;em&gt;how&lt;/em&gt; they execute the underlying operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;AWS Architecture Patterns for AI Agents&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Agents fit into systems just like any other capability fits into an application. You can keep things lightweight or expand into more distributed designs as your scale and needs change.&lt;/p&gt;

&lt;p&gt;The patterns highlighted below are intentionally leaving out other parts of agentic systems like memory, MCP servers, RAG, and multi-agent communication. Those are important topics, but those components sit inside the agent runtime or downstream from it rather than in the upstream architectural components we are focusing on here. &lt;/p&gt;

&lt;p&gt;You can extend or adapt these patterns for your use case. I will use AWS services as examples, with Amazon Bedrock AgentCore Runtime as the agent runtime, though you could swap these components with services from other providers and keep the same patterns.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Quick Amazon Bedrock AgentCore Primer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because the following examples use AWS services, here are the basics. AgentCore Runtime is a managed serverless environment for hosting AI agents. It handles deployment, scaling, and session management, and integrates with many tools and services both inside and outside of AWS. It supports both IAM and OAuth based identity so you can plug it into existing security models. To learn more about AgentCore Runtime, click &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agents-tools-runtime.html?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Minimal API Gateway Pattern&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuazkpmzc52silo3dpqvg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuazkpmzc52silo3dpqvg.png" alt="Client → Amazon API Gateway + AWS Web Application Firewall (WAF)→ Amazon Bedrock AgentCore Runtime → Downstream services" width="800" height="273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Client → Amazon API Gateway + AWS Web Application Firewall (WAF)→ Amazon Bedrock AgentCore Runtime → Downstream services&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use this when&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are moving from prototype to production and want a small number of well understood layers. &lt;/li&gt;
&lt;li&gt;You need basic protections like auth, rate limits, and input validation but do not yet have a large service ecosystem. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;API Gateway and AWS WAF provide authentication, rate limits, routing, web traffic filtering, and a controlled boundary before the agent is invoked. &lt;/p&gt;

&lt;p&gt;You can optionally include an AWS Lambda function between the API Gateway and the agent runtime which lets you write custom logic when invoking the agent, including deterministic input validation or other logic. &lt;/p&gt;

&lt;p&gt;AgentCore Runtime handles inbound identity using OAuth or IAM.&lt;/p&gt;

&lt;p&gt;If you later need queuing for incoming messages, you can include Amazon SQS between API Gateway and the agent and use a Lambda function that processes messages and invokes AgentCore Runtime. That lets you handle spiky traffic or ordered message processing without changing how the agent itself works.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.Traditional Backend + Agent Pattern&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6a30wq3tgy26b33y9okw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6a30wq3tgy26b33y9okw.png" alt="Client → Application Load Balancer + AWS WAF → Web server(s), e.g., on Amazon EC2, Amazon ECS, or AWS Lambda→ Amazon Bedrock AgentCore Runtime → Downstream services" width="800" height="411"&gt;&lt;/a&gt;&lt;br&gt;
Client → Application Load Balancer + AWS WAF → Web server(s), e.g., on Amazon EC2, Amazon ECS, or AWS Lambda→ Amazon Bedrock AgentCore Runtime → Downstream services&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use this when&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You already have a web backend that you need to integrate into or if you need a designated component for routing and business logic. &lt;/li&gt;
&lt;li&gt;You have non-trivial logic or workflow orchestration requirements. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many production workloads still run traditional web backends. Those architectures do not disappear or need a major overhaul when you add an AI agent. You extend them.&lt;/p&gt;

&lt;p&gt;The client sends requests through an Application Load Balancer which can integrate with AWS WAF for web filtering. From there, the request is sent to a web backend on Amazon EC2, containers, or Lambda. &lt;/p&gt;

&lt;p&gt;The backend handles business logic and system coordination. The agent is a capability it uses, invoked via a VPC endpoint so traffic remains private.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Deep Automation Agent Pattern&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjdbfz0wkhws79ffm6w9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjdbfz0wkhws79ffm6w9.png" alt="Events coming from Amazon EventBridge→ AWS Step Functions → AWS Lambda → Amazon Bedrock AgentCore Runtime → Downstream Systems " width="800" height="530"&gt;&lt;/a&gt;&lt;br&gt;
Events coming from Amazon EventBridge→ AWS Step Functions → AWS Lambda → Amazon Bedrock AgentCore Runtime → Downstream Systems &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use this when&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The value of the agent lives in backend processes, not in a chat UI. &lt;/li&gt;
&lt;li&gt;You want agents to be one part of a larger workflow. &lt;/li&gt;
&lt;li&gt;Work is triggered by events, schedules, or pipelines rather than direct user interaction. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here, the agent is a part of a larger workflow, pipeline, or automation task. Agents can potentially run asynchronously, with no user facing UI at all.&lt;/p&gt;

&lt;p&gt;Events from Amazon EventBridge or scheduled runs can invoke the agent in AgentCore Runtime directly using IAM as the authentication method. You can optionally introduce AWS Step Functions as a way to coordinate the steps of a long-running or multi-phased workflow that mixes deterministic and nondeterministic steps. &lt;/p&gt;

&lt;p&gt;Step Functions provides a workflow control mechanism so the agent does not need to manage retries, branching, or overall workflow state.&lt;/p&gt;

&lt;p&gt;The agent does its work and calls downstream services or tools as needed, while coordination between steps is handled by Step Functions. This allows you to run deterministic steps using services like AWS Lambda before, Amazon Simple Notification Service for notifying relevant parties after, or invoke various services in parallel to your agent. Again, you could swap out Step Functions for another workflow orchestrator and the concept still applies. &lt;/p&gt;

&lt;p&gt;These patterns let you start simple, introduce components only when needed, and grow into more distributed or mature architectures.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Takeaway&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you remember nothing else from this post, remember this: the question is not whether you can connect your client directly to an agent. You technically can. The question is whether you should.&lt;/p&gt;

&lt;p&gt;In the short term, it may feel fast and simple. In the long term, it leads to a brittle system that is difficult to extend, hard to understand, and expensive to maintain.&lt;/p&gt;

&lt;p&gt;A well-structured architecture lets agents be first class participants in your system without being overloaded by concerns that belong elsewhere. That is how you get the best of both worlds: the power of agentic reasoning combined with the reliability of proven distributed system design.&lt;/p&gt;

&lt;p&gt;And yes, the client→agent tutorials are still useful. They exist to teach one focused concept without burying you in use case specific and complex details. They show you how to get an agent running, not how to design the full application around it.&lt;/p&gt;

&lt;p&gt;But once you move toward production, the question becomes: Did we build a full system or did we stop at the agent?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent is the brain. The architecture is the body. You need both.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to learn more about agentic design patterns on AWS, visit &lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/agentic-ai-patterns/introduction.html?trk=a76ecb1b-1eaf-4e12-a22a-c872d8279680&amp;amp;sc_channel=el" rel="noopener noreferrer"&gt;Agentic AI patterns and workflows on AWS&lt;/a&gt; and stay tuned for more blog posts from the AWS team where we explore specific architectures for agentic AI use cases and advanced design patterns.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
