<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Raj Murugan</title>
    <description>The latest articles on Forem by Raj Murugan (@rajmurugan).</description>
    <link>https://forem.com/rajmurugan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1906575%2F92e08690-ea8e-4b95-93ce-525ed9f2668c.png</url>
      <title>Forem: Raj Murugan</title>
      <link>https://forem.com/rajmurugan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rajmurugan"/>
    <language>en</language>
    <item>
      <title>Part 5: CI/CD for Bedrock AgentCore with GitHub Actions and AWS OIDC (No Stored Credentials)</title>
      <dc:creator>Raj Murugan</dc:creator>
      <pubDate>Mon, 30 Mar 2026 21:38:35 +0000</pubDate>
      <link>https://forem.com/rajmurugan/part-5-cicd-for-bedrock-agentcore-with-github-actions-and-aws-oidc-no-stored-credentials-3cc1</link>
      <guid>https://forem.com/rajmurugan/part-5-cicd-for-bedrock-agentcore-with-github-actions-and-aws-oidc-no-stored-credentials-3cc1</guid>
      <description>&lt;p&gt;Storing AWS access keys in GitHub Secrets is the wrong approach. They rotate, they get leaked, and they're a compliance headache.&lt;/p&gt;

&lt;p&gt;The correct approach in 2025 is OIDC: GitHub Actions proves its identity to AWS using a short-lived token, assumes an IAM role, and gets temporary credentials. No stored keys, no rotation, no secrets to leak.&lt;/p&gt;

&lt;p&gt;This post walks through the complete CI/CD setup for AgentCore: OIDC config, the build/push/deploy pipeline, and the dual-tag ECR strategy that makes rollback practical.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why OIDC over stored credentials
&lt;/h2&gt;

&lt;p&gt;With stored &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt; / &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keys are long-lived (you rotate them, right? right?)&lt;/li&gt;
&lt;li&gt;Rotation requires updating secrets in every affected repo&lt;/li&gt;
&lt;li&gt;A leak (accidental commit, log output, third-party action) gives an attacker permanent access until rotated&lt;/li&gt;
&lt;li&gt;Keys are attached to an IAM user — you need a separate user per CI/CD system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With OIDC:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub generates a short-lived OIDC token per workflow run&lt;/li&gt;
&lt;li&gt;AWS validates the token against the trusted identity provider&lt;/li&gt;
&lt;li&gt;IAM role is assumed — credentials expire in 1 hour maximum&lt;/li&gt;
&lt;li&gt;No secrets to rotate, no keys to leak&lt;/li&gt;
&lt;li&gt;Trust policy is scoped to specific repos and branches&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Setting up OIDC
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Create the IAM OIDC provider (once per AWS account)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws iam create-open-id-connect-provider &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--url&lt;/span&gt; https://token.actions.githubusercontent.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--client-id-list&lt;/span&gt; sts.amazonaws.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--thumbprint-list&lt;/span&gt; 6938fd4d98bab03faadb97b34396831e3780aea1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells AWS to trust tokens from &lt;code&gt;token.actions.githubusercontent.com&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create the deploy IAM role
&lt;/h3&gt;

&lt;p&gt;The trust policy scopes the OIDC trust to your specific repo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Principal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Federated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::&amp;lt;ACCOUNT&amp;gt;:oidc-provider/token.actions.githubusercontent.com"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRoleWithWebIdentity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Condition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"StringEquals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"token.actions.githubusercontent.com:aud"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sts.amazonaws.com"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"StringLike"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"token.actions.githubusercontent.com:sub"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="s2"&gt;"repo:rajmurugan01/bedrock-agentcore-starter:*"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;StringLike&lt;/code&gt; condition with &lt;code&gt;*&lt;/code&gt; allows any branch. For production deployments, lock it down:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"StringEquals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"token.actions.githubusercontent.com:sub"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"repo:rajmurugan01/bedrock-agentcore-starter:ref:refs/heads/main"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Attach permissions to the deploy role
&lt;/h3&gt;

&lt;p&gt;The role needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ecr:GetAuthorizationToken&lt;/code&gt; — login to ECR&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ecr:BatchGetImage&lt;/code&gt;, &lt;code&gt;ecr:GetDownloadUrlForLayer&lt;/code&gt;, &lt;code&gt;ecr:PutImage&lt;/code&gt;, etc. — push to ECR&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bedrock-agentcore-control:UpdateAgentRuntime&lt;/code&gt; — update the Runtime after pushing a new image&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ssm:GetParameter&lt;/code&gt; — read Runtime ID and other config from SSM&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The deploy workflow
&lt;/h2&gt;

&lt;p&gt;The full file is &lt;a href="https://github.com/rajmurugan01/bedrock-agentcore-starter/blob/main/.github/workflows/deploy-agent.yml" rel="noopener noreferrer"&gt;.github/workflows/deploy-agent.yml&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Key sections:&lt;/p&gt;

&lt;h3&gt;
  
  
  Trigger
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;apps/customer-service-agent/**'&lt;/span&gt;
  &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;choice&lt;/span&gt;
        &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;dev&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;stg&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;prd&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;paths&lt;/code&gt; filter means the workflow only triggers when agent code changes — not on every push to main. Infrastructure changes (CDK) run in a separate workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  OIDC credential configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;id-token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;   &lt;span class="c1"&gt;# Required to receive the OIDC token&lt;/span&gt;
  &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;read&lt;/span&gt;

&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Configure AWS credentials&lt;/span&gt;
    &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-actions/configure-aws-credentials@v4&lt;/span&gt;
    &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;role-to-assume&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.AWS_DEPLOY_ROLE_ARN }}&lt;/span&gt;
      &lt;span class="na"&gt;aws-region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-east-1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;id-token: write&lt;/code&gt; permission is what enables OIDC. Without it, GitHub doesn't generate the OIDC token and the step fails.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build for linux/amd64
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build Docker image&lt;/span&gt;
  &lt;span class="na"&gt;working-directory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/customer-service-agent&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;docker build \&lt;/span&gt;
      &lt;span class="s"&gt;--platform linux/amd64 \&lt;/span&gt;
      &lt;span class="s"&gt;-t ${{ env.ECR_URI }}:latest \&lt;/span&gt;
      &lt;span class="s"&gt;-t ${{ env.ECR_URI }}:${{ env.GIT_SHA }} \&lt;/span&gt;
      &lt;span class="s"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This produces two tags simultaneously in one build — no rebuilding.&lt;/p&gt;

&lt;h3&gt;
  
  
  The dual-tag ECR strategy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Push to ECR&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;docker push ${{ env.ECR_URI }}:latest&lt;/span&gt;
    &lt;span class="s"&gt;docker push ${{ env.ECR_URI }}:${{ env.GIT_SHA }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;:latest&lt;/code&gt;&lt;/strong&gt; — AgentCore always pulls &lt;code&gt;:latest&lt;/code&gt; when you call &lt;code&gt;update-agent-runtime&lt;/code&gt;. This tag must always point to the most recent image.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;:&amp;lt;git-sha&amp;gt;&lt;/code&gt;&lt;/strong&gt; (e.g., &lt;code&gt;:a1b2c3d4&lt;/code&gt;) — pinned to a specific commit. If &lt;code&gt;:latest&lt;/code&gt; introduces a regression, you can roll back by pushing the previous SHA tag as &lt;code&gt;:latest&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Rollback to a previous image&lt;/span&gt;
docker pull &amp;lt;ecr-uri&amp;gt;:a1b2c3d4
docker tag &amp;lt;ecr-uri&amp;gt;:a1b2c3d4 &amp;lt;ecr-uri&amp;gt;:latest
docker push &amp;lt;ecr-uri&amp;gt;:latest
&lt;span class="c"&gt;# Then trigger update-agent-runtime again&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Updating the AgentCore Runtime
&lt;/h3&gt;

&lt;p&gt;After pushing the image, we tell AgentCore to pull the new &lt;code&gt;:latest&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Update AgentCore Runtime&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;RUNTIME_ID=$(aws ssm get-parameter \&lt;/span&gt;
      &lt;span class="s"&gt;--name "/customerServiceAgent/${{ env.ENVIRONMENT }}/runtime-id" \&lt;/span&gt;
      &lt;span class="s"&gt;--query Parameter.Value --output text)&lt;/span&gt;

    &lt;span class="s"&gt;aws bedrock-agentcore-control update-agent-runtime \&lt;/span&gt;
      &lt;span class="s"&gt;--agent-runtime-id "${RUNTIME_ID}" \&lt;/span&gt;
      &lt;span class="s"&gt;--agent-runtime-artifact '{"containerConfiguration":{"containerUri":"${{ env.ECR_URI }}:latest"}}' \&lt;/span&gt;
      &lt;span class="s"&gt;--role-arn "${{ secrets.EXECUTION_ROLE_ARN }}" \&lt;/span&gt;
      &lt;span class="s"&gt;--network-configuration '{"networkMode":"VPC","networkModeConfig":{"securityGroups":["${{ secrets.AGENT_SECURITY_GROUP_ID }}"],"subnets":["${{ secrets.AGENT_SUBNET_IDS }}"]}}' \&lt;/span&gt;
      &lt;span class="s"&gt;--region us-east-1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Remember Gotcha #7 from Part 2: &lt;code&gt;--role-arn&lt;/code&gt; and &lt;code&gt;--network-configuration&lt;/code&gt; are both mandatory. The &lt;code&gt;--role-arn&lt;/code&gt; is the &lt;strong&gt;execution role&lt;/strong&gt; (the role AgentCore uses at runtime), not the deploy role the workflow is running as.&lt;/p&gt;




&lt;h2&gt;
  
  
  The CI workflow
&lt;/h2&gt;

&lt;p&gt;Runs on every push and pull request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/ci.yml&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;lint-python&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pip install ruff black&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ruff check customer_service_agent/&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;black --check customer_service_agent/&lt;/span&gt;

  &lt;span class="na"&gt;test-infra&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm test&lt;/span&gt;              &lt;span class="c1"&gt;# Jest CDK unit tests&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm run synth&lt;/span&gt;         &lt;span class="c1"&gt;# CDK synth smoke test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CDK synth must succeed without AWS credentials. This works as long as &lt;code&gt;cdk.context.json&lt;/code&gt; is committed to the repo — it contains the VPC lookup cache that CDK needs for deterministic synthesis.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;cdk.context.json&lt;/code&gt; is missing (or the VPC lookup context changed), CDK will try to call the AWS API during synth and fail in CI. Regenerate it locally: &lt;code&gt;cdk context --clear &amp;amp;&amp;amp; cdk synth&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-environment promotion
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;workflow_dispatch&lt;/code&gt; trigger lets you manually promote a build:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;choice&lt;/span&gt;
        &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;dev&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;stg&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;prd&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combined with GitHub Environments (configured in repository Settings → Environments), you can require manual approval before deploying to &lt;code&gt;stg&lt;/code&gt; or &lt;code&gt;prd&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Push to &lt;code&gt;main&lt;/code&gt; → auto-deploys to &lt;code&gt;dev&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Manually trigger workflow → select &lt;code&gt;stg&lt;/code&gt; → GitHub requires approval from reviewers&lt;/li&gt;
&lt;li&gt;After approval → deploys to &lt;code&gt;stg&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Manual trigger → select &lt;code&gt;prd&lt;/code&gt; → same approval gate&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code&gt;environment:&lt;/code&gt; key in the job declaration activates the GitHub Environment's protection rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ inputs.environment || 'dev' }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  GitHub Secrets to configure
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Secret&lt;/th&gt;
&lt;th&gt;Where it comes from&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AWS_DEPLOY_ROLE_ARN&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;ARN of the OIDC role you created&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;EXECUTION_ROLE_ARN&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CDK output &lt;code&gt;ExecutionRole&lt;/code&gt; ARN&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AGENT_SECURITY_GROUP_ID&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CDK output Security Group ID&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AGENT_SUBNET_IDS&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CDK output subnet IDs (comma-separated)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are repo-level secrets (Settings → Secrets and variables → Actions). For multi-environment setups, use environment-level secrets to have different values per environment.&lt;/p&gt;




&lt;h2&gt;
  
  
  End-to-end flow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Developer pushes to main
    ↓
GitHub Actions: ci.yml runs (lint + CDK tests, ~2 min)
    ↓
GitHub Actions: deploy-agent.yml triggers (paths: apps/**)
    ↓
Configure AWS credentials (OIDC, ~10s)
    ↓
docker build --platform linux/amd64 (~3-5 min)
    ↓
docker push :latest + :&amp;lt;sha&amp;gt; to ECR (~1-2 min)
    ↓
update-agent-runtime CLI (~30s)
    ↓
AgentCore pulls new image, restarts container instances
    ↓
New code is live
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total time from push to live: ~8-10 minutes.&lt;/p&gt;

&lt;p&gt;In the final part, we look at cost — how much this system actually costs to run, where prompt caching saves the most, and how to set CloudWatch alarms before your bill surprises you.&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;&lt;a href="https://dev.to/blog/part-6-cost-performance-prompt-caching"&gt;Continue to Part 6: Cost &amp;amp; Performance&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://rajmurugan.com/blog/part-5-cicd-github-actions-oidc" rel="noopener noreferrer"&gt;rajmurugan.com&lt;/a&gt;. This is Part 5 of the &lt;a href="https://rajmurugan.com/blog" rel="noopener noreferrer"&gt;Ultimate Guide to Building AI Agents on AWS with Bedrock AgentCore&lt;/a&gt; series.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>githubactions</category>
      <category>aws</category>
      <category>cicd</category>
      <category>bedrock</category>
    </item>
    <item>
      <title>Part 3: Building the AI Agent with Strands Agents SDK, Prompt Caching, and AgentCore Memory</title>
      <dc:creator>Raj Murugan</dc:creator>
      <pubDate>Mon, 30 Mar 2026 21:37:54 +0000</pubDate>
      <link>https://forem.com/rajmurugan/part-3-building-the-ai-agent-with-strands-agents-sdk-prompt-caching-and-agentcore-memory-134a</link>
      <guid>https://forem.com/rajmurugan/part-3-building-the-ai-agent-with-strands-agents-sdk-prompt-caching-and-agentcore-memory-134a</guid>
      <description>&lt;p&gt;With the CDK infrastructure in place (Part 2), we need an actual agent to run inside it.&lt;/p&gt;

&lt;p&gt;The agent is a Python application that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Exposes an HTTP endpoint AgentCore can call&lt;/li&gt;
&lt;li&gt;Uses the Strands Agents SDK to run a Bedrock-backed reasoning loop&lt;/li&gt;
&lt;li&gt;Integrates with AgentCore Memory for persistent context&lt;/li&gt;
&lt;li&gt;Uses Bedrock Guardrails on every invocation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full source is in &lt;a href="https://github.com/rajmurugan01/bedrock-agentcore-starter/tree/main/apps/customer-service-agent" rel="noopener noreferrer"&gt;apps/customer-service-agent/&lt;/a&gt; in the demo repo.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Strands over LangChain or LlamaIndex?
&lt;/h2&gt;

&lt;p&gt;When I started this project, LangChain was the default answer for "I need to build an agent." I used it, ran into friction, and switched to Strands. Here's why:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strands is AWS-native.&lt;/strong&gt; It's built to integrate directly with Bedrock services — prompt caching, guardrail configs, tool definitions. With LangChain, you write adapter code to bridge from LangChain abstractions down to raw Bedrock APIs. With Strands, you're calling the Bedrock API directly through a thin, intentional abstraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool definitions are simpler.&lt;/strong&gt; In LangChain, you define tools with &lt;code&gt;StructuredTool.from_function()&lt;/code&gt; or subclass &lt;code&gt;BaseTool&lt;/code&gt;. In Strands, you decorate a function with &lt;code&gt;@tool&lt;/code&gt; and the docstring becomes the description:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# LangChain approach
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StructuredTool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderLookupInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The order ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lookup_order_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;StructuredTool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lookup_order_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lookup_order_status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Look up the current status of an order&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;args_schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;OrderLookupInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Strands approach
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lookup_order_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Look up the current status of a customer order by order ID.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Active development matches AgentCore.&lt;/strong&gt; Strands is developed at a cadence that tracks AgentCore releases. New AgentCore features show up in Strands before they make it to LangChain adapters.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;customer_service_agent/
├── __init__.py
├── config.py       # Settings from env vars
├── prompts.py      # System prompt
├── tools.py        # @tool definitions
├── memory.py       # AgentCore Memory boto3 helpers
├── agent.py        # BedrockModel setup + streaming
└── main.py         # FastAPI app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  config.py — environment variables
&lt;/h2&gt;

&lt;p&gt;Everything the agent needs is injected as environment variables by AgentCore. In production, you set &lt;code&gt;EnvironmentVariables&lt;/code&gt; on the &lt;code&gt;CfnRuntime&lt;/code&gt; resource in CDK. Locally, you use &lt;code&gt;.env.local&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_settings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseSettings&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Settings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseSettings&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;agentcore_memory_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;bedrock_guardrail_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;bedrock_guardrail_version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;aws_region&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dev&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Primary: Claude Sonnet 4.6
&lt;/span&gt;    &lt;span class="n"&gt;primary_model_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-sonnet-4-6-20251001-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Background: Nova Pro (~15x cheaper per token)
&lt;/span&gt;    &lt;span class="n"&gt;background_model_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amazon.nova-pro-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;env_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.env.local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;settings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Settings&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The dual-model strategy
&lt;/h2&gt;

&lt;p&gt;The agent uses two Bedrock models for different tasks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Sonnet 4.6&lt;/strong&gt; for main conversations — best reasoning, multi-step tool use, nuanced responses. More expensive but worth it for the customer-facing output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Nova Pro&lt;/strong&gt; for background tasks — ~15x cheaper per token. Ideal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Classifying intent before routing&lt;/li&gt;
&lt;li&gt;Summarising long conversation history&lt;/li&gt;
&lt;li&gt;Generating internal labels/tags&lt;/li&gt;
&lt;li&gt;Any task where "good enough" is sufficient&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Prompt caching — the 90% cost saving
&lt;/h2&gt;

&lt;p&gt;This is the most impactful optimisation in the whole system.&lt;/p&gt;

&lt;p&gt;Prompt caching works like this: you mark part of your prompt as a "cacheable prefix". Bedrock caches those tokens server-side for ~5 minutes. On subsequent calls that use the same prefix, you pay the &lt;strong&gt;cache read price&lt;/strong&gt; instead of the full input token price.&lt;/p&gt;

&lt;p&gt;For Claude Sonnet 4.6:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cache write: $3.00 per 1M input tokens (same as normal)&lt;/li&gt;
&lt;li&gt;Cache read: $0.30 per 1M input tokens&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Saving: 90% on cached tokens&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system prompt is the perfect candidate for caching — it's the same on every request in a session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;botocore.config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Config&lt;/span&gt;

&lt;span class="c1"&gt;# Adaptive retry — Bedrock throttles hard under load
&lt;/span&gt;&lt;span class="n"&gt;boto_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_attempts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adaptive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;read_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;primary_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-sonnet-4-6-20251001-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;boto_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;boto_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;additional_request_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;# Enable prompt caching (Anthropic beta feature on Bedrock)
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic_beta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt-caching-2024-07-31&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;guardrail_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guardrailIdentifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bedrock_guardrail_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guardrailVersion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bedrock_guardrail_version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# System prompt with cache_control: ephemeral
# This marks the prompt as a cacheable prefix for Bedrock
&lt;/span&gt;&lt;span class="n"&gt;cached_system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# Cache this prefix
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a 1,500-token system prompt at 100 requests/day with 5 turns each:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Without caching: ~450,000 system prompt tokens/day × $3/1M = &lt;strong&gt;$1.35/day&lt;/strong&gt; just for system prompts&lt;/li&gt;
&lt;li&gt;With caching: first turn normal price, turns 2-5 at cache read → &lt;strong&gt;~$0.27/day&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Saving: ~$1/day, ~$365/year on just the system prompt&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The saving scales linearly with session length and request volume.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tool definitions
&lt;/h2&gt;

&lt;p&gt;Strands tools are just decorated Python functions. The function signature defines the input schema, and the docstring is sent to the model as the tool description:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lookup_order_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Look up the current status and details of a customer order by order ID.
    Use this when a customer asks about their order, delivery, or shipment.

    Args:
        order_id: The order ID (format: ORD-XXXXXX)

    Returns:
        Order details including status, items, and estimated delivery date.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Your implementation here
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_product_faq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Search the product FAQ and policy knowledge base for answers to customer questions.
&lt;/span&gt;&lt;span class="gp"&gt;    ...&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tools are passed to the &lt;code&gt;Agent&lt;/code&gt; constructor. Strands handles the tool invocation loop — calling the tool when the model decides to use it, feeding the result back, and continuing the reasoning loop until the model produces a final response.&lt;/p&gt;




&lt;h2&gt;
  
  
  AgentCore Memory integration
&lt;/h2&gt;

&lt;p&gt;AgentCore Memory provides persistent storage across sessions without you building any of the storage infrastructure.&lt;/p&gt;

&lt;p&gt;The three strategy types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantic&lt;/strong&gt; — stores facts and user profile information. Consolidates information like "user prefers email contact", "user is on premium plan".&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Summary&lt;/strong&gt; — stores compressed session history. "On 2025-03-15 user reported late delivery of order ORD-001234. Issue was resolved."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UserPreference&lt;/strong&gt; — stores interaction style. "User prefers brief responses without extra detail."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The memory client is a standard boto3 client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;botocore.config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Config&lt;/span&gt;

&lt;span class="n"&gt;memory_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-agentcore-memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_attempts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adaptive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;read_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Store a conversation turn
&lt;/span&gt;&lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memoryId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agentcore_memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actorId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;# Identifies the user (e.g., user ID from JWT)
&lt;/span&gt;    &lt;span class="n"&gt;sessionId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;# Identifies the conversation session
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ASSISTANT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;assistant_message&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Retrieve relevant memories before each invocation
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve_memory_records&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memoryId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agentcore_memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actorId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;searchQuery&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Semantic search over stored memories
&lt;/span&gt;    &lt;span class="n"&gt;maxResults&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memoryRecords&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The retrieved memories are prepended to the user message as context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;memories&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;enriched_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Past context:]
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The streaming agent loop
&lt;/h2&gt;

&lt;p&gt;The agent produces a streaming response via the Strands &lt;code&gt;stream_async&lt;/code&gt; method. Each chunk is forwarded as an SSE event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_agent_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Retrieve memories
&lt;/span&gt;    &lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve_relevant_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;enriched_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;prepend_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Build agent with tools
&lt;/span&gt;    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;primary_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cached_system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;lookup_order_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_product_faq&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Stream response
&lt;/span&gt;    &lt;span class="n"&gt;full_response_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enriched_message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;full_response_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# SSE format
&lt;/span&gt;
    &lt;span class="c1"&gt;# 4. Store turn in memory
&lt;/span&gt;    &lt;span class="nf"&gt;store_conversation_turn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;assistant_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_response_parts&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: [DONE]

&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The adaptive retry config
&lt;/h2&gt;

&lt;p&gt;Bedrock throttles hard when you exceed your model's TPS (tokens per second) limit. Without retry logic, throttled requests fail immediately.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mode: "adaptive"&lt;/code&gt; uses a token bucket algorithm — it monitors the throttle rate and automatically backs off when it detects pressure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;botocore.config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Config&lt;/span&gt;

&lt;span class="n"&gt;boto_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_attempts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adaptive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;read_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# Streaming responses can take 30-90s for complex tool chains
&lt;/span&gt;    &lt;span class="n"&gt;connect_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference between &lt;code&gt;"standard"&lt;/code&gt; and &lt;code&gt;"adaptive"&lt;/code&gt; retry modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;standard&lt;/code&gt;: fixed exponential backoff between retries&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;adaptive&lt;/code&gt;: adjusts retry rate based on observed throttling, converges to a sustainable rate faster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For agentic workloads that run multi-step tool chains — and thus make many Bedrock calls in sequence — &lt;code&gt;"adaptive"&lt;/code&gt; consistently outperforms &lt;code&gt;"standard"&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting it all together
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;agent.py&lt;/code&gt; file wires everything together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;primary_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;primary_model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;boto_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;boto_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;additional_request_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic_beta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt-caching-2024-07-31&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
    &lt;span class="n"&gt;guardrail_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guardrailIdentifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bedrock_guardrail_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;cached_system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}]&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_agent_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve_relevant_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;enriched&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;prepend_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;primary_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cached_system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;lookup_order_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_product_faq&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;full_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enriched&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;full_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nf"&gt;store_conversation_turn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_response&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: [DONE]

&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Part 4, we set up the local Docker dev environment so you can iterate on the agent code without deploying to AWS on every change.&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;&lt;a href="https://dev.to/blog/part-4-local-dev-docker"&gt;Continue to Part 4: Running Locally with Docker&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://rajmurugan.com/blog/part-3-strands-agent-sdk" rel="noopener noreferrer"&gt;rajmurugan.com&lt;/a&gt;. This is Part 3 of the &lt;a href="https://rajmurugan.com/blog" rel="noopener noreferrer"&gt;Ultimate Guide to Building AI Agents on AWS with Bedrock AgentCore&lt;/a&gt; series.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>python</category>
      <category>bedrock</category>
      <category>agentcore</category>
    </item>
    <item>
      <title>Part 6: Cost &amp; Performance for Bedrock AgentCore — Prompt Caching, Model Selection, and CloudWatch Alarms</title>
      <dc:creator>Raj Murugan</dc:creator>
      <pubDate>Mon, 30 Mar 2026 21:34:23 +0000</pubDate>
      <link>https://forem.com/rajmurugan/part-6-cost-performance-for-bedrock-agentcore-prompt-caching-model-selection-and-cloudwatch-282k</link>
      <guid>https://forem.com/rajmurugan/part-6-cost-performance-for-bedrock-agentcore-prompt-caching-model-selection-and-cloudwatch-282k</guid>
      <description>&lt;p&gt;You've deployed the agent. It works. Now let's make sure it doesn't cost you a surprise at the end of the month.&lt;/p&gt;

&lt;p&gt;This is the part that most tutorials skip. Real production systems need cost visibility before incidents — not after. Here's everything I've done to keep costs predictable and to save money where it counts.&lt;/p&gt;




&lt;h2&gt;
  
  
  The cost components
&lt;/h2&gt;

&lt;p&gt;An AgentCore deployment has several cost drivers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Pricing model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Bedrock model invocations&lt;/td&gt;
&lt;td&gt;Per token (input + output)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentCore Runtime&lt;/td&gt;
&lt;td&gt;Per container-hour (when active)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentCore Memory&lt;/td&gt;
&lt;td&gt;Per memory operation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ECR&lt;/td&gt;
&lt;td&gt;Per GB stored + data transfer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch Logs&lt;/td&gt;
&lt;td&gt;Per GB ingested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 (if used)&lt;/td&gt;
&lt;td&gt;Negligible for this setup&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The dominant cost is almost always &lt;strong&gt;Bedrock model invocations&lt;/strong&gt;. Everything else is small by comparison.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prompt caching: the biggest lever
&lt;/h2&gt;

&lt;p&gt;If you haven't read Part 3 carefully, go back and re-read the prompt caching section. It's the highest-impact optimisation in the system.&lt;/p&gt;

&lt;p&gt;Quick recap: by marking your system prompt with &lt;code&gt;cache_control: ephemeral&lt;/code&gt;, Bedrock caches those tokens and charges the cache read price on subsequent calls.&lt;/p&gt;

&lt;p&gt;For Claude Sonnet 4.6:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cache write: $3.00 / 1M input tokens&lt;/li&gt;
&lt;li&gt;Cache read: &lt;strong&gt;$0.30 / 1M input tokens&lt;/strong&gt; (10x cheaper)&lt;/li&gt;
&lt;li&gt;Output tokens: $15.00 / 1M output tokens (not cached)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a 1,500-token system prompt:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Cost per turn&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Without caching&lt;/td&gt;
&lt;td&gt;$0.0045 (system prompt) + output tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;With caching (turns 2+)&lt;/td&gt;
&lt;td&gt;$0.00045 (system prompt) + output tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Saving per turn&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$0.004&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That sounds small. Scale it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;100 users × 10 conversations/day × 5 turns each = 5,000 turns/day&lt;/li&gt;
&lt;li&gt;4,000 of those turns are turns 2+ (caching applies)&lt;/li&gt;
&lt;li&gt;Saving: 4,000 × $0.004 = &lt;strong&gt;$16/day → $480/month on system prompt tokens alone&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The saving scales linearly with session depth and volume.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enable prompt caching:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;primary_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-sonnet-4-6-20251001-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;additional_request_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic_beta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt-caching-2024-07-31&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;cached_system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache_control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Model selection strategy
&lt;/h2&gt;

&lt;p&gt;Not every task needs Claude Sonnet 4.6. Using the right model for each task type dramatically reduces costs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Recommended model&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Main conversation&lt;/td&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;Best reasoning, multi-turn, complex tool use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intent classification&lt;/td&gt;
&lt;td&gt;Amazon Nova Pro&lt;/td&gt;
&lt;td&gt;Simple classification, ~15x cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session summarisation&lt;/td&gt;
&lt;td&gt;Amazon Nova Pro&lt;/td&gt;
&lt;td&gt;Structured output, no complex reasoning needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FAQ matching&lt;/td&gt;
&lt;td&gt;Amazon Nova Pro or embedding model&lt;/td&gt;
&lt;td&gt;Simple retrieval pattern&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Billing dispute analysis&lt;/td&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;Complex reasoning required&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Current pricing comparison (us-east-1):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input ($/1M)&lt;/th&gt;
&lt;th&gt;Output ($/1M)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Nova Pro&lt;/td&gt;
&lt;td&gt;$0.80&lt;/td&gt;
&lt;td&gt;$3.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Nova Lite&lt;/td&gt;
&lt;td&gt;$0.06&lt;/td&gt;
&lt;td&gt;$0.24&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For a classification task that returns 1-2 tokens and processes 500 input tokens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Sonnet 4.6: $0.0015 per call&lt;/li&gt;
&lt;li&gt;Amazon Nova Pro: $0.0004 per call&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Saving: ~75% just by routing to the right model&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In &lt;code&gt;agent.py&lt;/code&gt;, the Nova model is available alongside the primary model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;nova_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amazon.nova-pro-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;boto_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;boto_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use it when you need a cheap background task before or after the main conversation.&lt;/p&gt;




&lt;h2&gt;
  
  
  AgentCore lifecycle configuration
&lt;/h2&gt;

&lt;p&gt;AgentCore has two lifecycle settings that affect cost:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Idle timeout&lt;/strong&gt; (&lt;code&gt;IdleTimeoutInSeconds&lt;/code&gt;): how long AgentCore waits before pausing a container instance after the last request. Set in the CDK stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;LifecycleConfiguration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;IdleTimeoutInSeconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// 15 minutes&lt;/span&gt;
  &lt;span class="nx"&gt;MaxSessionDurationInSeconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;28800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// 8 hours&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Lower idle timeout = containers paused sooner = lower cost for bursty workloads&lt;/li&gt;
&lt;li&gt;Higher idle timeout = containers stay warm longer = better latency for returning users&lt;/li&gt;
&lt;li&gt;The sweet spot depends on your session gap pattern. 15 minutes is a reasonable default.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Max session duration&lt;/strong&gt;: the hard limit per session. 8 hours is appropriate for a long-running assistant. For short transactional interactions, you could reduce this.&lt;/p&gt;




&lt;h2&gt;
  
  
  CloudFront PriceClass_100
&lt;/h2&gt;

&lt;p&gt;For the blog/portfolio site, using &lt;code&gt;PriceClass.PRICE_CLASS_100&lt;/code&gt; restricts CloudFront distribution to US and European edge locations only. This cuts CF cost by ~50% compared to the global price class.&lt;/p&gt;

&lt;p&gt;For a personal portfolio with mostly English-speaking traffic, the 95th percentile of users are in the US and Europe anyway.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// infra/lib/hosting-stack.ts&lt;/span&gt;
&lt;span class="nx"&gt;priceClass&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cloudfront&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PriceClass&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PRICE_CLASS_100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the AgentCore endpoint itself, there's no CloudFront in front — AgentCore is a regional service.&lt;/p&gt;




&lt;h2&gt;
  
  
  CloudWatch alarms: catch runaway costs before they hit your bill
&lt;/h2&gt;

&lt;p&gt;Two alarms are critical for an AgentCore deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alarm 1: OutputTokenCount spike
&lt;/h3&gt;

&lt;p&gt;An agentic loop that gets stuck (tool keeps failing, model keeps retrying) can generate thousands of output tokens per minute. This alarm fires when output tokens per 5 minutes exceed a threshold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Alarm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OutputTokenAlarm&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;alarmName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`customerServiceAgent-OutputTokenCount-dev`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Metric&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AWS/Bedrock&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;metricName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OutputTokenCount&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;dimensionsMap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ModelId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;anthropic.claude-sonnet-4-6-20251001-v1:0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;statistic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Sum&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// Tune to your expected usage&lt;/span&gt;
  &lt;span class="na"&gt;evaluationPeriods&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;comparisonOperator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ComparisonOperator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GREATER_THAN_THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;treatMissingData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;TreatMissingData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NOT_BREACHING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set the threshold to 2-3x your normal peak. Monitor for a week after launch to establish a baseline, then tune.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alarm 2: InvocationLatency P99
&lt;/h3&gt;

&lt;p&gt;High P99 latency indicates your agent is taking too long — possibly waiting on a tool timeout, or the model is iterating excessively:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Alarm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;LatencyAlarm&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Metric&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AWS/Bedrock&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;metricName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;InvocationLatency&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;statistic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;p99&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;period&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// 30 seconds&lt;/span&gt;
  &lt;span class="na"&gt;evaluationPeriods&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;comparisonOperator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ComparisonOperator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GREATER_THAN_THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both alarms publish to the SNS topic (also in the CDK stack), which sends you an email. For production, replace email with a PagerDuty or Slack notification via SNS → Lambda → webhook.&lt;/p&gt;




&lt;h2&gt;
  
  
  Actual cost estimates
&lt;/h2&gt;

&lt;p&gt;For a moderately used customer service agent at ~500 conversations/day, 5 turns each:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Monthly estimate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Bedrock (Claude Sonnet 4.6, with caching)&lt;/td&gt;
&lt;td&gt;$120-180&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bedrock (Nova Pro for classification)&lt;/td&gt;
&lt;td&gt;$5-10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentCore Runtime&lt;/td&gt;
&lt;td&gt;$15-30 (depends on idle config)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentCore Memory operations&lt;/td&gt;
&lt;td&gt;$5-10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ECR storage&lt;/td&gt;
&lt;td&gt;$1-2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch Logs&lt;/td&gt;
&lt;td&gt;$3-5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$150-240/month&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Without prompt caching: add ~$60-80/month to the Bedrock line.&lt;/p&gt;

&lt;p&gt;Without the dual-model strategy (Claude Sonnet 4.6 for everything): add ~$20-30/month to the Bedrock line.&lt;/p&gt;

&lt;p&gt;These numbers will vary significantly based on your conversation length and output token counts. The alarms will tell you when something is outside the expected range.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick optimisation checklist
&lt;/h2&gt;

&lt;p&gt;Before going to production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Prompt caching enabled (&lt;code&gt;anthropic_beta: ["prompt-caching-2024-07-31"]&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;[ ] System prompt marked with &lt;code&gt;cache_control: ephemeral&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Nova Pro used for background tasks (not Claude for everything)&lt;/li&gt;
&lt;li&gt;[ ] Idle timeout set appropriately (900s is a good default)&lt;/li&gt;
&lt;li&gt;[ ] OutputTokenCount alarm configured and tested&lt;/li&gt;
&lt;li&gt;[ ] InvocationLatency alarm configured and tested&lt;/li&gt;
&lt;li&gt;[ ] SNS topic with email subscription (or PagerDuty) set up&lt;/li&gt;
&lt;li&gt;[ ] CloudFront PriceClass_100 set (blog site)&lt;/li&gt;
&lt;li&gt;[ ] Model invocation logging enabled (for debugging cost spikes)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Wrapping up the series
&lt;/h2&gt;

&lt;p&gt;Over 6 parts, we built a complete production AI agent on AWS:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Part 1&lt;/strong&gt;: Why AgentCore — the Lambda limitations and what AgentCore solves&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 2&lt;/strong&gt;: CDK infrastructure — the full stack + 9 gotchas documented&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3&lt;/strong&gt;: The Python agent — Strands SDK, prompt caching, AgentCore Memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 4&lt;/strong&gt;: Local dev loop — Docker, platform flags, .env pattern&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 5&lt;/strong&gt;: CI/CD — GitHub Actions OIDC, ECR dual-tag strategy, Runtime updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 6&lt;/strong&gt; (this post): Cost and performance — prompt caching savings, model selection, alarms&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full demo repo is at &lt;strong&gt;&lt;a href="https://github.com/rajmurugan01/bedrock-agentcore-starter" rel="noopener noreferrer"&gt;github.com/rajmurugan01/bedrock-agentcore-starter&lt;/a&gt;&lt;/strong&gt;. Every pattern in this series maps to real code in that repo.&lt;/p&gt;

&lt;p&gt;If this series saved you some debugging time (or a surprise AWS bill), star the repo and share it. If I got something wrong or you've found a better pattern, open an issue — I'll update the posts.&lt;/p&gt;

&lt;p&gt;← &lt;strong&gt;&lt;a href="https://dev.to/blog/part-5-cicd-github-actions-oidc"&gt;Back to Part 5: CI/CD with GitHub Actions OIDC&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://rajmurugan.com/blog/part-6-cost-performance-prompt-caching" rel="noopener noreferrer"&gt;rajmurugan.com&lt;/a&gt;. This is Part 6 of the &lt;a href="https://rajmurugan.com/blog" rel="noopener noreferrer"&gt;Ultimate Guide to Building AI Agents on AWS with Bedrock AgentCore&lt;/a&gt; series.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>bedrock</category>
      <category>agentcore</category>
      <category>cost</category>
    </item>
    <item>
      <title>Part 4: Running Your AgentCore Agent Locally with Docker (The Right Way)</title>
      <dc:creator>Raj Murugan</dc:creator>
      <pubDate>Mon, 30 Mar 2026 21:32:01 +0000</pubDate>
      <link>https://forem.com/rajmurugan/part-4-running-your-agentcore-agent-locally-with-docker-the-right-way-1cc8</link>
      <guid>https://forem.com/rajmurugan/part-4-running-your-agentcore-agent-locally-with-docker-the-right-way-1cc8</guid>
      <description>&lt;p&gt;You've written the agent code. Before pushing it to ECR and waiting for AgentCore to pull it, you want to run it locally and confirm it actually works.&lt;/p&gt;

&lt;p&gt;This part covers the local Docker dev loop — including a critical flag that's easy to miss and silently produces the wrong result.&lt;/p&gt;




&lt;h2&gt;
  
  
  The &lt;code&gt;--platform linux/amd64&lt;/code&gt; requirement
&lt;/h2&gt;

&lt;p&gt;This is the most important thing in this entire post.&lt;/p&gt;

&lt;p&gt;Amazon Bedrock AgentCore runtime is &lt;strong&gt;x86_64 only&lt;/strong&gt;. If you build your Docker image without specifying the platform, Docker uses your host architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;On an Intel Mac or Linux x86_64 machine: builds &lt;code&gt;linux/amd64&lt;/code&gt; ✅&lt;/li&gt;
&lt;li&gt;On an Apple Silicon Mac (M1/M2/M3/M4): builds &lt;code&gt;linux/arm64&lt;/code&gt; ❌&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The arm64 image will work perfectly in your local Docker test because your Mac is arm64. But when AgentCore pulls &lt;code&gt;:latest&lt;/code&gt; and tries to run it on an x86_64 host, the container exits immediately with an exec format error — and AgentCore reports the Runtime as &lt;code&gt;FAILED&lt;/code&gt; with a cryptic message.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Always build with &lt;code&gt;--platform linux/amd64&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/amd64 &lt;span class="nt"&gt;-t&lt;/span&gt; customer-service-agent:local &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This forces Docker to produce an x86_64 image regardless of your host architecture. On an Apple Silicon Mac, Docker uses QEMU emulation to run the build — it's a bit slower but produces the correct output.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Dockerfile
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;Dockerfile&lt;/code&gt; is in &lt;code&gt;apps/customer-service-agent/&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; --platform=linux/amd64 python:3.12-slim&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; customer_service_agent/ ./customer_service_agent/&lt;/span&gt;

&lt;span class="c"&gt;# AgentCore always calls port 8080 — this is not configurable&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8080&lt;/span&gt;

&lt;span class="c"&gt;# Health check — AgentCore probes GET /health before routing traffic&lt;/span&gt;
&lt;span class="k"&gt;HEALTHCHECK&lt;/span&gt;&lt;span class="s"&gt; --interval=10s --timeout=5s --start-period=30s \&lt;/span&gt;
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')" || exit 1

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["uvicorn", "customer_service_agent.main:app", "--host", "0.0.0.0", "--port", "8080"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to note:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;--platform=linux/amd64&lt;/code&gt; in the &lt;code&gt;FROM&lt;/code&gt; line ensures the base image is also x86_64&lt;/li&gt;
&lt;li&gt;Port 8080 is hardcoded — AgentCore doesn't let you configure this&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The &lt;code&gt;.env.local&lt;/code&gt; pattern
&lt;/h2&gt;

&lt;p&gt;In production, AgentCore injects environment variables from the &lt;code&gt;EnvironmentVariables&lt;/code&gt; block you set in the CDK &lt;code&gt;CfnRuntime&lt;/code&gt; resource. Locally, we replicate this with a &lt;code&gt;.env.local&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;Copy &lt;code&gt;.env.local.example&lt;/code&gt; to &lt;code&gt;.env.local&lt;/code&gt; and fill in the values from your CDK stack outputs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env.local&lt;/span&gt;
&lt;span class="nv"&gt;AGENTCORE_MEMORY_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;xxxxxxxxxxxxxxxxxxxx
&lt;span class="nv"&gt;BEDROCK_GUARDRAIL_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;xxxxxxxxxxxxxxxx
&lt;span class="nv"&gt;BEDROCK_GUARDRAIL_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-east-1
&lt;span class="nv"&gt;ENVIRONMENT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dev
&lt;span class="nv"&gt;LOG_LEVEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;DEBUG
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get the values from CDK outputs or SSM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ssm get-parameter &lt;span class="nt"&gt;--name&lt;/span&gt; /customerServiceAgent/dev/memory-id &lt;span class="nt"&gt;--query&lt;/span&gt; Parameter.Value &lt;span class="nt"&gt;--output&lt;/span&gt; text
aws ssm get-parameter &lt;span class="nt"&gt;--name&lt;/span&gt; /customerServiceAgent/dev/guardrail-id &lt;span class="nt"&gt;--query&lt;/span&gt; Parameter.Value &lt;span class="nt"&gt;--output&lt;/span&gt; text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Running the container locally
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;apps/customer-service-agent

&lt;span class="c"&gt;# Build for linux/amd64 (even on an Apple Silicon Mac)&lt;/span&gt;
docker build &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/amd64 &lt;span class="nt"&gt;-t&lt;/span&gt; customer-service-agent:local &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Run with real AWS dev credentials&lt;/span&gt;
docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/amd64 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--env-file&lt;/span&gt; .env.local &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/.aws:/root/.aws:ro"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  customer-service-agent:local
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-v "$HOME/.aws:/root/.aws:ro"&lt;/code&gt; mounts your local AWS credentials into the container as read-only. This lets the agent call Bedrock and AgentCore Memory using your dev credentials, exactly as it would with the execution role in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't do this in production.&lt;/strong&gt; In production, the execution role is attached to the container by AgentCore. Mounting credentials is a local-only pattern.&lt;/p&gt;




&lt;h2&gt;
  
  
  Verifying the health check
&lt;/h2&gt;

&lt;p&gt;Once the container starts, verify the health endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8080/health
&lt;span class="c"&gt;# → {"status":"healthy","environment":"dev"}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AgentCore probes &lt;code&gt;GET /health&lt;/code&gt; before routing any traffic to a container instance. If the health check fails, AgentCore marks the instance as unhealthy and doesn't send requests to it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing with curl
&lt;/h2&gt;

&lt;p&gt;The agent responds to &lt;code&gt;POST /invoke&lt;/code&gt; with a Server-Sent Events stream. The &lt;code&gt;--no-buffer&lt;/code&gt; flag is important — without it, curl buffers the response and you don't see streaming:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/invoke &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "messages": [
      {"role": "user", "content": "What is the status of order ORD-001234?"}
    ],
    "sessionId": "test-session-1",
    "actorId": "user-test-123"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-buffer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see SSE events streaming back:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data: I'll look up order ORD-001234 for you.

data: **Order ORD-001234:**
data: - Status: In Transit
data: - Items: Wireless Headphones (x1), Phone Case (x2)
data: - Estimated delivery: April 5, 2025
data: - Tracking: UPS-9876543210

data: [DONE]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Common local dev errors
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;exec format error&lt;/code&gt;&lt;/strong&gt; — You built an arm64 image and are running it on an x86_64 host (or vice versa). Add &lt;code&gt;--platform linux/amd64&lt;/code&gt; to both &lt;code&gt;docker build&lt;/code&gt; and &lt;code&gt;docker run&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;Connection refused on port 8080&lt;/code&gt;&lt;/strong&gt; — Container hasn't started yet or the health check is failing. Check &lt;code&gt;docker logs &amp;lt;container-id&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;NoCredentialsError&lt;/code&gt;&lt;/strong&gt; — The &lt;code&gt;.aws&lt;/code&gt; mount isn't working or the profile in &lt;code&gt;.env.local&lt;/code&gt; doesn't match a profile in &lt;code&gt;~/.aws/credentials&lt;/code&gt;. Try &lt;code&gt;AWS_PROFILE=default&lt;/code&gt; or remove the profile and let boto3 use instance metadata chain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;ResourceNotFoundException&lt;/code&gt; on memory client&lt;/strong&gt; — &lt;code&gt;AGENTCORE_MEMORY_ID&lt;/code&gt; is empty or wrong. Check the value against the SSM parameter. The memory module gracefully falls back (skips memory operations) if the ID is empty, so this shouldn't crash the agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Slow response on Apple Silicon&lt;/strong&gt; — You're running an x86_64 container under QEMU emulation. This is ~3-5x slower than native and is expected for local testing. The deployed version on AgentCore's x86_64 hosts will be much faster.&lt;/p&gt;




&lt;h2&gt;
  
  
  The local dev loop
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;1. Edit Python code
   ↓
2. docker build &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/amd64 &lt;span class="nt"&gt;-t&lt;/span&gt; customer-service-agent:local &lt;span class="nb"&gt;.&lt;/span&gt;
   ↓
3. docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/amd64 &lt;span class="nt"&gt;-p&lt;/span&gt; 8080:8080 &lt;span class="nt"&gt;--env-file&lt;/span&gt; .env.local &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/.aws:/root/.aws:ro"&lt;/span&gt; customer-service-agent:local
   ↓
4. curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/invoke ... &lt;span class="nt"&gt;--no-buffer&lt;/span&gt;
   ↓
5. Iterate &lt;span class="k"&gt;until &lt;/span&gt;response is correct, &lt;span class="k"&gt;then &lt;/span&gt;push to ECR
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Part 5, we automate steps 2-5 via GitHub Actions with OIDC — so every push to &lt;code&gt;main&lt;/code&gt; builds the image, pushes it to ECR, and updates the AgentCore Runtime.&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;&lt;a href="https://dev.to/blog/part-5-cicd-github-actions-oidc"&gt;Continue to Part 5: CI/CD with GitHub Actions OIDC&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://rajmurugan.com/blog/part-4-local-dev-docker" rel="noopener noreferrer"&gt;rajmurugan.com&lt;/a&gt;. This is Part 4 of the &lt;a href="https://rajmurugan.com/blog" rel="noopener noreferrer"&gt;Ultimate Guide to Building AI Agents on AWS with Bedrock AgentCore&lt;/a&gt; series.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>aws</category>
      <category>bedrock</category>
      <category>agentcore</category>
    </item>
    <item>
      <title>Part 2: CDK Infrastructure for Amazon Bedrock AgentCore (And Every Gotcha You'll Hit)</title>
      <dc:creator>Raj Murugan</dc:creator>
      <pubDate>Mon, 30 Mar 2026 21:31:59 +0000</pubDate>
      <link>https://forem.com/rajmurugan/part-2-cdk-infrastructure-for-amazon-bedrock-agentcore-and-every-gotcha-youll-hit-393h</link>
      <guid>https://forem.com/rajmurugan/part-2-cdk-infrastructure-for-amazon-bedrock-agentcore-and-every-gotcha-youll-hit-393h</guid>
      <description>&lt;p&gt;This is the post I wish had existed when I was debugging my first AgentCore CDK deploy at midnight.&lt;/p&gt;

&lt;p&gt;AgentCore is a relatively new service and CDK support is still catching up to the API. There are at least 9 specific traps that will silently fail, throw cryptic errors, or leave your CloudFormation stack in an unrecoverable state if you don't know about them.&lt;/p&gt;

&lt;p&gt;I'm going to walk through the complete CDK stack and call out every one of them.&lt;/p&gt;

&lt;p&gt;The full source is in &lt;a href="https://github.com/rajmurugan01/bedrock-agentcore-starter/blob/main/infra/lib/agentcore-stack.ts" rel="noopener noreferrer"&gt;infra/lib/agentcore-stack.ts&lt;/a&gt; in the demo repo.&lt;/p&gt;




&lt;h2&gt;
  
  
  The stack: what we're creating
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CustomerServiceAgentStack-dev
├── KMS Key (rotation enabled, RETAIN on delete)
├── CloudWatch LogGroup — /aws/bedrock-agentcore/runtimes/...
├── CloudWatch LogGroup — /aws/bedrock/model-invocations/...
├── IAM Role (InvocationLogging) — bedrock.amazonaws.com principal
├── CfnResource — AWS::Bedrock::ModelInvocationLoggingConfiguration ← Gotcha #3
├── SNS Topic — cost alarm notifications
├── CloudWatch Alarm — OutputTokenCount
├── CloudWatch Alarm — InvocationLatency
├── Bedrock CfnGuardrail + CfnGuardrailVersion
├── CfnResource — AWS::BedrockAgentCore::Memory (3 strategies)  ← Gotcha #8
├── ECR Repository (IMPORTED, not created)                       ← Gotcha #2
├── IAM Role (ExecutionRole) — bedrock-agentcore.amazonaws.com
├── Security Group (allowAllOutbound: false)                     ← Gotcha #6
├── CfnResource — AWS::BedrockAgentCore::AgentRuntime
└── SSM Parameters (7 parameters for all ARNs/IDs)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's go through each section and the gotchas that come with it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;infra/
├── bin/app.ts
├── lib/agentcore-stack.ts
├── test/agentcore-stack.test.ts
├── jest.config.js
├── package.json
├── tsconfig.json
└── cdk.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;package.json&lt;/code&gt; needs &lt;code&gt;aws-cdk-lib&lt;/code&gt; and &lt;code&gt;constructs&lt;/code&gt; as dependencies, plus &lt;code&gt;jest&lt;/code&gt; + &lt;code&gt;ts-jest&lt;/code&gt; as devDependencies for the unit tests.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotcha #1: AgentCore naming — NO HYPHENS
&lt;/h2&gt;

&lt;p&gt;This one is not in the documentation in any obvious place. AgentCore Runtime names, Memory names, and Memory strategy names must match this regex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;^[a-zA-Z][a-zA-Z0-9_]{0,47}$
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hyphens fail at deploy time with a cryptic CloudFormation validation error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Value 'customer-service-agent-dev' at 'agentRuntimeName' failed to satisfy
constraint: Member must satisfy regular expression pattern: [a-zA-Z][a-zA-Z0-9_]{0,47}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix is simple — use camelCase or underscores:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ This will fail at deploy time&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;runtimeName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`customer-service-agent-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// ✅ This works&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;runtimeName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`customerServiceAgent_&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;memoryName&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`customerServiceAgentMemory_&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This applies to &lt;strong&gt;every&lt;/strong&gt; naming field in AgentCore: Runtime names, Memory names, and the names of individual Memory strategies.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotcha #2: ECR chicken-and-egg
&lt;/h2&gt;

&lt;p&gt;This one will waste a full deploy cycle if you don't know about it.&lt;/p&gt;

&lt;p&gt;AgentCore Runtime requires a &lt;strong&gt;valid container image at &lt;code&gt;:latest&lt;/code&gt;&lt;/strong&gt; when the &lt;code&gt;CfnRuntime&lt;/code&gt; resource is created. The timing problem: CDK creates both the ECR repo and the Runtime in the same deploy → the Runtime creation fails because the ECR repo is empty.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix&lt;/strong&gt; is a two-step process:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1&lt;/strong&gt;: Run the bootstrap script once before the first &lt;code&gt;cdk deploy&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# infra/scripts/bootstrap-ecr.sh&lt;/span&gt;

aws ecr create-repository &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repository-name&lt;/span&gt; &lt;span class="s2"&gt;"customerserviceagent/runtime-dev"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;span class="c"&gt;# Push a placeholder (any valid linux/amd64 image)&lt;/span&gt;
docker pull &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/amd64 public.ecr.aws/amazonlinux/amazonlinux:2
docker tag public.ecr.aws/amazonlinux/amazonlinux:2 &lt;span class="se"&gt;\&lt;/span&gt;
  &amp;lt;account&amp;gt;.dkr.ecr.us-east-1.amazonaws.com/customerserviceagent/runtime-dev:latest
docker push &amp;lt;account&amp;gt;.dkr.ecr.us-east-1.amazonaws.com/customerserviceagent/runtime-dev:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2&lt;/strong&gt;: In CDK, &lt;strong&gt;import&lt;/strong&gt; the repo — never create it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ✅ Import (repo already exists from bootstrap script)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agentRepo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ecr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Repository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromRepositoryName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AgentRepo&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ecrRepoName&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// ❌ Don't do this — CDK would create it empty and the Runtime deploy would fail&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agentRepo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;ecr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Repository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AgentRepo&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Gotcha #3: No CDK L1 for &lt;code&gt;ModelInvocationLoggingConfiguration&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;If you try to enable Bedrock model invocation logging, you'll find that &lt;code&gt;aws-cdk-lib&lt;/code&gt; (up to 2.245.0) has no L1 construct for &lt;code&gt;AWS::Bedrock::ModelInvocationLoggingConfiguration&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Searching the CDK docs or autocomplete for &lt;code&gt;CfnModelInvocationLoggingConfiguration&lt;/code&gt; returns nothing. You must use raw &lt;code&gt;CfnResource&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ This doesn't exist in aws-cdk-lib&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CfnModelInvocationLoggingConfiguration&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-bedrock&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// ✅ Use raw CfnResource with the CloudFormation type string&lt;/span&gt;
&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ModelInvocationLogging&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AWS::Bedrock::ModelInvocationLoggingConfiguration&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;LoggingConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;CloudWatchConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;LogGroupName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invocationLogGroup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logGroupName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;RoleArn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invocationLoggingRole&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;roleArn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;TextDataDeliveryEnabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;ImageDataDeliveryEnabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;EmbeddingDataDeliveryEnabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The IAM role for this must have &lt;code&gt;logs:CreateLogStream&lt;/code&gt; and &lt;code&gt;logs:PutLogEvents&lt;/code&gt; on the log group ARN, and it must be assumed by &lt;code&gt;bedrock.amazonaws.com&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotcha #4: VPC endpoints — don't recreate existing ones
&lt;/h2&gt;

&lt;p&gt;AgentCore runs inside a VPC. It needs to reach Bedrock, ECR, and SSM without going through the public internet (for both performance and security).&lt;/p&gt;

&lt;p&gt;The trap: if your VPC was provisioned by Terraform or another CDK stack, it may already have interface endpoints for Bedrock, ECR, and S3. Creating duplicate interface endpoints with the same private DNS name fails with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;private-dns-enabled cannot be set because there is already a conflicting
DNS domain for bedrock-runtime.us-east-1.amazonaws.com in this VPC
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For managed VPCs&lt;/strong&gt;: Use &lt;code&gt;Vpc.fromLookup()&lt;/code&gt; and &lt;strong&gt;skip creating VPC endpoints&lt;/strong&gt;. Assume they already exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For the demo (default VPC)&lt;/strong&gt;: No pre-existing endpoints, so we create the minimum needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vpc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromLookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;DefaultVpc&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;isDefault&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Only add endpoints if they don't already exist in your VPC&lt;/span&gt;
&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addInterfaceEndpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;BedrockRuntimeEndpoint&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;InterfaceVpcEndpointAwsService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BEDROCK_RUNTIME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;securityGroups&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;agentSg&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addGatewayEndpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;S3Endpoint&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GatewayVpcEndpointAwsService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Gotcha #5: KMS + CloudWatch LogGroup key policy
&lt;/h2&gt;

&lt;p&gt;If you want to encrypt a CloudWatch LogGroup with a customer-managed KMS key, the key policy &lt;strong&gt;must&lt;/strong&gt; explicitly grant &lt;code&gt;logs.amazonaws.com&lt;/code&gt; permission to use the key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The key policy must include this:&lt;/span&gt;
&lt;span class="nx"&gt;kmsKey&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addToResourcePolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PolicyStatement&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;principals&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ServicePrincipal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`logs.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.amazonaws.com`&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
  &lt;span class="na"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;kms:Encrypt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;kms:Decrypt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;kms:GenerateDataKey&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;kms:DescribeKey&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, CloudWatch silently fails to encrypt logs (or the log group creation fails with a KMS access denied error).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For most use cases, skip the CMK entirely.&lt;/strong&gt; CloudWatch uses AWS-managed encryption by default. The only reason to add a CMK is if you have a compliance requirement that mandates customer-controlled key rotation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotcha #6: Security Group egress is inline, not a separate resource
&lt;/h2&gt;

&lt;p&gt;This one catches you in CDK unit tests, not in the actual deployment.&lt;/p&gt;

&lt;p&gt;When using &lt;code&gt;allowAllOutbound: false&lt;/code&gt; and calling &lt;code&gt;addEgressRule(Peer.ipv4(cidr), Port.tcp(443))&lt;/code&gt;, CDK embeds the egress rule &lt;strong&gt;inside&lt;/strong&gt; the &lt;code&gt;SecurityGroup&lt;/code&gt; resource's &lt;code&gt;SecurityGroupEgress&lt;/code&gt; array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agentSg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SecurityGroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AgentSg&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;allowAllOutbound&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Disables default egress-all rule&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;agentSg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEgressRule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Peer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ipv4&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpcCidrBlock&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="nx"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Port&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tcp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;443&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;HTTPS to VPC CIDR&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is &lt;strong&gt;no separate &lt;code&gt;AWS::EC2::SecurityGroupEgress&lt;/code&gt; resource&lt;/strong&gt; created for this. In CDK assertions tests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ This will find 0 resources — it doesn't exist as a separate resource&lt;/span&gt;
&lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hasResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AWS::EC2::SecurityGroupEgress&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;

&lt;span class="c1"&gt;// ✅ Check the inline array inside the SecurityGroup resource&lt;/span&gt;
&lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hasResourceProperties&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AWS::EC2::SecurityGroup&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;SecurityGroupEgress&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayWith&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="nx"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;objectLike&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;IpProtocol&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tcp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;FromPort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ToPort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;]),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: a &lt;strong&gt;separate&lt;/strong&gt; &lt;code&gt;AWS::EC2::SecurityGroupEgress&lt;/code&gt; resource IS created when you reference another security group as the peer (cross-SG rules). This only applies to IP/CIDR-based rules.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotcha #7: &lt;code&gt;update-agent-runtime&lt;/code&gt; requires &lt;code&gt;--role-arn&lt;/code&gt; and &lt;code&gt;--network-configuration&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;After deployment, when you push a new Docker image and want AgentCore to pick it up, you call &lt;code&gt;update-agent-runtime&lt;/code&gt;. Both &lt;code&gt;--role-arn&lt;/code&gt; and &lt;code&gt;--network-configuration&lt;/code&gt; are now &lt;strong&gt;mandatory&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws bedrock-agentcore-control update-agent-runtime &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--agent-runtime-id&lt;/span&gt; &amp;lt;runtime-id&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--agent-runtime-artifact&lt;/span&gt; &lt;span class="s1"&gt;'{"containerConfiguration":{"containerUri":"&amp;lt;ecr&amp;gt;:latest"}}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-arn&lt;/span&gt; arn:aws:iam::&amp;lt;account&amp;gt;:role/customerServiceAgentExecutionRole-dev &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--network-configuration&lt;/span&gt; &lt;span class="s1"&gt;'{
    "networkMode": "VPC",
    "networkModeConfig": {
      "securityGroups": ["sg-xxx"],
      "subnets": ["subnet-aaa", "subnet-bbb"]
    }
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Omitting either &lt;code&gt;--role-arn&lt;/code&gt; or &lt;code&gt;--network-configuration&lt;/code&gt; gives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ValidationException: Missing required field: roleArn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--role-arn&lt;/code&gt; here is the &lt;strong&gt;execution role&lt;/strong&gt; (the role the Runtime assumes to pull from ECR and call Bedrock) — not the deploy role your CLI is using.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gotcha #8: Memory stuck in &lt;code&gt;CREATING&lt;/code&gt; during rollback
&lt;/h2&gt;

&lt;p&gt;If your CDK deploy fails &lt;strong&gt;after&lt;/strong&gt; the AgentCore Memory resource starts creating, the CloudFormation rollback will also fail. The Memory resource is in &lt;code&gt;CREATING&lt;/code&gt; state and CloudFormation can't delete it.&lt;/p&gt;

&lt;p&gt;You'll see this error in the CloudFormation events:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DELETE_FAILED: Cannot delete resource while it is in CREATING state
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Recovery steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Find the stuck memory&lt;/span&gt;
aws bedrock-agentcore-control list-memories &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;span class="c"&gt;# 2. Wait for it to finish creating (usually a few minutes), then delete&lt;/span&gt;
aws bedrock-agentcore-control delete-memory &lt;span class="nt"&gt;--memory-id&lt;/span&gt; &amp;lt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;span class="c"&gt;# 3. Delete the stuck CloudFormation stack&lt;/span&gt;
aws cloudformation delete-stack &lt;span class="nt"&gt;--stack-name&lt;/span&gt; CustomerServiceAgentStack-dev &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;span class="c"&gt;# 4. Retry&lt;/span&gt;
cdk deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Gotcha #9: &lt;code&gt;arrayWith&lt;/code&gt; in CDK assertions is order-sensitive
&lt;/h2&gt;

&lt;p&gt;This one only matters if you write CDK unit tests, but it will confuse you when it does.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Match.arrayWith([patternA, patternB])&lt;/code&gt; requires the elements to appear in the &lt;strong&gt;same order&lt;/strong&gt; as in the synthesised template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The template has filtersConfig: [PROMPT_ATTACK, HATE, INSULTS, SEXUAL, VIOLENCE]&lt;/span&gt;

&lt;span class="c1"&gt;// ✅ Works — PROMPT_ATTACK before HATE&lt;/span&gt;
&lt;span class="nx"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayWith&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="nx"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;objectLike&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PROMPT_ATTACK&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;objectLike&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;HATE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;// ❌ Fails — even though both are present, order doesn't match the template&lt;/span&gt;
&lt;span class="nx"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayWith&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="nx"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;objectLike&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;HATE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;objectLike&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PROMPT_ATTACK&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix: write your &lt;code&gt;Match.arrayWith&lt;/code&gt; patterns in the same order as the properties appear in your CDK code.&lt;/p&gt;




&lt;h2&gt;
  
  
  The CDK unit test strategy
&lt;/h2&gt;

&lt;p&gt;With all these gotchas, testing matters. Here's the approach from &lt;a href="https://github.com/rajmurugan01/bedrock-agentcore-starter/blob/main/infra/test/agentcore-stack.test.ts" rel="noopener noreferrer"&gt;infra/test/agentcore-stack.test.ts&lt;/a&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Snapshot test&lt;/strong&gt; — the primary safety net. Any change to the synthesised template fails CI until explicitly updated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Naming regex test&lt;/strong&gt; — verify Runtime and Memory names match the no-hyphens regex.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IAM trust test&lt;/strong&gt; — verify &lt;code&gt;bedrock-agentcore.amazonaws.com&lt;/code&gt; is the principal on the execution role.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Group inline egress test&lt;/strong&gt; — verify the pattern from Gotcha #6.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSM parameter tests&lt;/strong&gt; — verify all 7 parameters exist with the correct paths.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Protocol test&lt;/strong&gt; — verify &lt;code&gt;ServerProtocol: 'HTTP'&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Guardrail filter order test&lt;/strong&gt; — verify filters appear in the correct order (Gotcha #9).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Run tests with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;infra &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For deterministic synthesis without AWS credentials, commit &lt;code&gt;cdk.context.json&lt;/code&gt; (the VPC lookup cache). Without it, CDK would try to call the AWS API during &lt;code&gt;cdk synth&lt;/code&gt;, breaking CI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deploying
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;infra
npm &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# First time only&lt;/span&gt;
./scripts/bootstrap-ecr.sh    &lt;span class="c"&gt;# Gotcha #2 — must run before cdk deploy&lt;/span&gt;

&lt;span class="c"&gt;# Deploy&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CDK_DEFAULT_ACCOUNT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;your-account-id&amp;gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ENVIRONMENT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dev
cdk deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The deploy takes 5-10 minutes. The Memory resource is the slowest to provision (~3-4 minutes).&lt;/p&gt;

&lt;p&gt;In Part 3, we write the Python agent that runs inside the container AgentCore manages.&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;&lt;a href="https://dev.to/blog/part-3-strands-agent-sdk"&gt;Continue to Part 3: Building the Agent with Strands SDK&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://rajmurugan.com/blog/part-2-cdk-infrastructure-bedrock-agentcore" rel="noopener noreferrer"&gt;rajmurugan.com&lt;/a&gt;. This is Part 2 of the &lt;a href="https://rajmurugan.com/blog" rel="noopener noreferrer"&gt;Ultimate Guide to Building AI Agents on AWS with Bedrock AgentCore&lt;/a&gt; series.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cdk</category>
      <category>typescript</category>
      <category>bedrock</category>
    </item>
    <item>
      <title>Part 1: Why I Chose Amazon Bedrock AgentCore (And What Lambda Gets Wrong for AI Agents)</title>
      <dc:creator>Raj Murugan</dc:creator>
      <pubDate>Mon, 30 Mar 2026 21:30:54 +0000</pubDate>
      <link>https://forem.com/rajmurugan/part-1-why-i-chose-amazon-bedrock-agentcore-and-what-lambda-gets-wrong-for-ai-agents-jm3</link>
      <guid>https://forem.com/rajmurugan/part-1-why-i-chose-amazon-bedrock-agentcore-and-what-lambda-gets-wrong-for-ai-agents-jm3</guid>
      <description>&lt;p&gt;I built a production AI agent on AWS. Not a demo, not a proof of concept — a real system with persistent memory, guardrails, CI/CD pipelines, and users who depend on it not going down at 2am.&lt;/p&gt;

&lt;p&gt;The thing nobody tells you: &lt;strong&gt;the hard part isn't the AI&lt;/strong&gt;. The hard part is the infrastructure around it.&lt;/p&gt;

&lt;p&gt;This series is my attempt to document everything I had to figure out the hard way — from architecture decisions in Part 1 all the way to cost optimisation in Part 6. The companion demo repo is at &lt;a href="https://github.com/rajmurugan01/bedrock-agentcore-starter" rel="noopener noreferrer"&gt;github.com/rajmurugan01/bedrock-agentcore-starter&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let's start at the beginning: why Amazon Bedrock AgentCore, and why not the "obvious" serverless approach.&lt;/p&gt;




&lt;h2&gt;
  
  
  The obvious approach: Lambda + Bedrock
&lt;/h2&gt;

&lt;p&gt;If you've shipped anything serverless on AWS, your first instinct is Lambda. You know it, it has great tooling, CDK support is mature, and it scales to zero.&lt;/p&gt;

&lt;p&gt;For a simple Bedrock wrapper — get a message, call &lt;code&gt;InvokeModel&lt;/code&gt;, return a response — Lambda is fine. But the moment you add &lt;strong&gt;conversational state&lt;/strong&gt;, it starts to crack.&lt;/p&gt;

&lt;p&gt;Here's what a real conversational AI agent needs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Session state&lt;/strong&gt; — the agent needs to remember what happened earlier in the conversation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-running processing&lt;/strong&gt; — LLMs can take 30-90 seconds for complex multi-tool chains&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory across sessions&lt;/strong&gt; — the agent should know who the user is from previous conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming responses&lt;/strong&gt; — users expect tokens to appear progressively, not wait 60 seconds for a blob&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's look at how Lambda handles each of these.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 1: Lambda's 15-minute timeout
&lt;/h3&gt;

&lt;p&gt;Lambda has a hard maximum execution timeout of 15 minutes. For a simple Q&amp;amp;A, that's fine. But for an agentic loop — where the model calls tools, processes results, calls more tools, and reasons over everything — you can easily hit 5-10 minutes per complex interaction.&lt;/p&gt;

&lt;p&gt;And I haven't even mentioned the user's session. If a user comes back after 20 minutes and continues the conversation, that's a new Lambda invocation with zero context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 2: Session state storage
&lt;/h3&gt;

&lt;p&gt;Lambda is stateless by design. Every invocation is independent. For conversational state, you need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Store session state somewhere (DynamoDB, ElastiCache, S3)&lt;/li&gt;
&lt;li&gt;Load it at the start of every Lambda invocation&lt;/li&gt;
&lt;li&gt;Save it at the end of every invocation&lt;/li&gt;
&lt;li&gt;Handle the edge case where the Lambda times out mid-session&lt;/li&gt;
&lt;li&gt;Build a session expiry and cleanup mechanism&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's a lot of undifferentiated infrastructure for a problem that isn't your core business.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 3: Cross-session memory
&lt;/h3&gt;

&lt;p&gt;Beyond session state, real assistants need &lt;strong&gt;memory&lt;/strong&gt; — the ability to remember that a user's preferred contact method is email, that they're a premium customer, that they had a billing dispute last month.&lt;/p&gt;

&lt;p&gt;With Lambda, you'd need to build this yourself: a vector database for semantic recall, a summarisation pipeline to consolidate old sessions, a retrieval step before each invocation. Entirely custom, entirely your problem to maintain.&lt;/p&gt;




&lt;h2&gt;
  
  
  What AgentCore actually does
&lt;/h2&gt;

&lt;p&gt;Amazon Bedrock AgentCore is AWS's managed infrastructure for running AI agents. It's designed specifically for the workload pattern that Lambda handles poorly.&lt;/p&gt;

&lt;p&gt;Here's the mental model: AgentCore is &lt;strong&gt;a managed container orchestrator for long-running, stateful AI agent sessions&lt;/strong&gt;. You ship a Docker container with your agent code. AgentCore handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Container lifecycle&lt;/strong&gt; — starts, stops, scales, and restarts containers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session routing&lt;/strong&gt; — routes each user session to the right container instance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory persistence&lt;/strong&gt; — built-in Semantic, Summary, and UserPreference memory strategies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JWT validation&lt;/strong&gt; — validates Cognito (or custom) JWTs before your code even runs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC networking&lt;/strong&gt; — runs your containers inside your VPC without cold start penalties&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSE streaming&lt;/strong&gt; — handles the HTTP connection and SSE protocol for you&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architectural difference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Lambda approach:
  User message → API Gateway → Lambda (cold start?) → load session from DynamoDB →
  call Bedrock → save session to DynamoDB → return response → Lambda exits

AgentCore approach:
  User message → AgentCore Runtime (JWT validated) → your container (already warm) →
  call Bedrock → response streams back → container stays warm for next message
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The architecture we're building
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌────────────────────────────────────────────────────────────────┐
│  GitHub Actions (OIDC)                                         │
│  ├── Build Docker (linux/amd64)                                │
│  ├── Push to ECR (:latest + :&amp;lt;sha&amp;gt;)                           │
│  └── update-agent-runtime CLI                                  │
└──────────────────────────────┬─────────────────────────────────┘
                               │
                    CDK v2 TypeScript deploys:
                               │
┌──────────────────────────────▼─────────────────────────────────┐
│  AWS Infrastructure (us-east-1)                                │
│                                                                │
│  AgentCore Runtime                                             │
│  ├── Cognito JWT authoriser                                    │
│  ├── AG-UI HTTP protocol (SSE streaming)                      │
│  └── Container: Python agent on port 8080                     │
│                                                                │
│  AgentCore Memory (3 strategies)                               │
│  Bedrock Guardrail (prompt injection + PII)                   │
│  CloudWatch alarms (token count + latency)                    │
└────────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Primary model&lt;/strong&gt;: Claude Sonnet 4.6 with prompt caching&lt;br&gt;
&lt;strong&gt;Background model&lt;/strong&gt;: Amazon Nova Pro (cheap classification/summarisation)&lt;br&gt;
&lt;strong&gt;CI/CD&lt;/strong&gt;: GitHub Actions OIDC — no stored AWS credentials&lt;/p&gt;




&lt;h2&gt;
  
  
  Series roadmap
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Part&lt;/th&gt;
&lt;th&gt;Topic&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Part 1&lt;/strong&gt; (this post)&lt;/td&gt;
&lt;td&gt;Architecture &amp;amp; why AgentCore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Part 2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full CDK stack + 9 deployment gotchas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Part 3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Python agent with Strands SDK + prompt caching&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Part 4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker local dev loop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Part 5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GitHub Actions OIDC + ECR + Runtime updates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Part 6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cost breakdown + alarms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Full demo repo: &lt;a href="https://github.com/rajmurugan01/bedrock-agentcore-starter" rel="noopener noreferrer"&gt;github.com/rajmurugan01/bedrock-agentcore-starter&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://rajmurugan.com/blog/part-1-why-bedrock-agentcore" rel="noopener noreferrer"&gt;rajmurugan.com&lt;/a&gt;. This is Part 1 of the &lt;a href="https://rajmurugan.com/blog" rel="noopener noreferrer"&gt;Ultimate Guide to Building AI Agents on AWS with Bedrock AgentCore&lt;/a&gt; series.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>bedrock</category>
      <category>agentcore</category>
      <category>aiagents</category>
    </item>
    <item>
      <title># Complete Guide to RAG Evaluations in Amazon Bedrock</title>
      <dc:creator>Raj Murugan</dc:creator>
      <pubDate>Tue, 20 Jan 2026 11:36:07 +0000</pubDate>
      <link>https://forem.com/rajmurugan/-complete-guide-to-rag-evaluations-in-amazon-bedrock-4je3</link>
      <guid>https://forem.com/rajmurugan/-complete-guide-to-rag-evaluations-in-amazon-bedrock-4je3</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the rapidly evolving landscape of artificial intelligence, Retrieval Augmented Generation (RAG) has emerged as a powerful technique for enhancing the capabilities of large language models (LLMs). By grounding LLMs with external knowledge bases, RAG systems can generate more accurate, relevant, and up-to-date responses, mitigating issues like hallucination and outdated information. Amazon Bedrock provides a robust platform for building and deploying RAG applications, offering a suite of foundation models and tools to streamline development.&lt;/p&gt;

&lt;p&gt;However, the true power of a RAG system lies not just in its construction, but in its continuous evaluation and refinement. Ensuring that your RAG application consistently delivers high-quality responses requires a systematic approach to assessment. This comprehensive guide will walk you through the process of setting up and conducting RAG evaluations within Amazon Bedrock, focusing on automatic assessment of your knowledge base performance. We will cover everything from initial prerequisites and environment setup to creating evaluation jobs, monitoring key metrics, and interpreting results, empowering you to build and maintain highly effective RAG solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites &amp;amp; Environment Setup
&lt;/h2&gt;

&lt;p&gt;Before diving into the RAG evaluation process, it's essential to ensure your environment is correctly configured and you have the necessary prerequisites in place. This section outlines the foundational requirements for a smooth evaluation experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Essential Prerequisites
&lt;/h3&gt;

&lt;p&gt;To begin, you will need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;An AWS Account&lt;/strong&gt;: Access to an active Amazon Web Services (AWS) account is fundamental for utilizing Amazon Bedrock and its associated services like S3 and IAM.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Basic Knowledge of AWS S3 and IAM Roles&lt;/strong&gt;: Familiarity with Amazon S3 for data storage and AWS Identity and Access Management (IAM) for managing permissions is crucial. You will be interacting with S3 buckets for storing evaluation datasets and configuring IAM roles for service access.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Environment Configuration
&lt;/h3&gt;

&lt;p&gt;Careful environment setup ensures compatibility and optimal performance for your RAG evaluations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;AWS Region Selection&lt;/strong&gt;: It is recommended to use either the &lt;strong&gt;US East (N. Virginia)&lt;/strong&gt; or &lt;strong&gt;US West (Oregon)&lt;/strong&gt; AWS regions. These regions typically offer the broadest support for the latest Amazon Bedrock features and foundation models. Always verify that your chosen services and models are available in your selected region.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Model Selection&lt;/strong&gt;: For the purpose of this guide, we will primarily use the &lt;strong&gt;Amazon Nova Micro v1.0&lt;/strong&gt; model. This model is a good starting point for evaluations due to its balance of performance and cost-effectiveness. However, it is imperative to: 

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Verify Regional Support&lt;/strong&gt;: Confirm that the Amazon Nova Micro v1.0 model (or any other model you choose) is supported in your selected AWS region.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Check Pricing&lt;/strong&gt;: Always review the pricing for your chosen model, as costs can vary based on model type, usage, and region. Understanding the cost implications upfront will help manage your AWS expenditure effectively.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;By ensuring these prerequisites are met and your environment is properly configured, you lay the groundwork for a successful and insightful RAG evaluation journey within Amazon Bedrock.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step-by-Step Guide to RAG Evaluation
&lt;/h2&gt;

&lt;p&gt;This section provides a detailed, step-by-step walkthrough of how to set up and execute RAG evaluations in Amazon Bedrock. Each step is designed to guide you through the process, from creating your knowledge base to analyzing the evaluation results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Create a Knowledge Base
&lt;/h3&gt;

&lt;p&gt;The foundation of any RAG application is its knowledge base. This knowledge base serves as the external data source that the LLM will retrieve information from. Follow these instructions to set up your knowledge base in Amazon Bedrock:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Navigate to Amazon Bedrock&lt;/strong&gt;: In the AWS Management Console, search for and select "Amazon Bedrock."&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Access Knowledge Bases&lt;/strong&gt;: From the Bedrock console, go to the left-hand navigation pane and select &lt;strong&gt;Knowledge Bases&lt;/strong&gt;. Then, click on the &lt;strong&gt;Create knowledge base&lt;/strong&gt; button.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Provide Knowledge Base Details&lt;/strong&gt;: 

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Knowledge base name&lt;/strong&gt;: Enter a descriptive name for your knowledge base (e.g., &lt;code&gt;myFirstBedrockKB&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Description&lt;/strong&gt;: (Optional) Provide a brief description of your knowledge base.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;IAM service role&lt;/strong&gt;: Choose to &lt;strong&gt;Create and use a new service role&lt;/strong&gt;. Make a note of the role name, as it will be useful for future reference and permissions management.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Configure Data Source&lt;/strong&gt;: 

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Data source&lt;/strong&gt;: Select &lt;strong&gt;S3&lt;/strong&gt; as your data source type.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Data source location&lt;/strong&gt;: Specify &lt;strong&gt;"This AWS account"&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;S3 URI&lt;/strong&gt;: Provide the S3 URI for your S3 bucket where your data is stored (e.g., &lt;code&gt;s3://mykbbucket&lt;/code&gt;). This bucket should contain the documents that your RAG application will use.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Chunking and parsing configurations&lt;/strong&gt;: For initial setup, you can keep the default settings. These configurations determine how your documents are split and processed for retrieval.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Select Embeddings Model&lt;/strong&gt;: Choose &lt;strong&gt;"Titan Text Embeddings v2"&lt;/strong&gt; as your embeddings model. This model will convert your documents into vector embeddings, enabling semantic search and retrieval.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Set up Vector Database&lt;/strong&gt;: For the vector database, select &lt;strong&gt;"Quick create a new vector store"&lt;/strong&gt;. You can choose between &lt;strong&gt;"Amazon OpenSearch Serverless"&lt;/strong&gt; or &lt;strong&gt;"Amazon S3 Vector Store (In Preview)"&lt;/strong&gt;. The vector database stores the embeddings and facilitates efficient similarity searches.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Create Knowledge Base&lt;/strong&gt;: Click &lt;strong&gt;Next&lt;/strong&gt; and then &lt;strong&gt;Create knowledge base&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The creation of the knowledge base and the associated vector database can take some time. Please be patient during this provisioning process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Sync Data Source
&lt;/h3&gt;

&lt;p&gt;Once your knowledge base is created, you need to synchronize it with your data source to ensure that the latest information is available for retrieval. This step ensures that any updates or new documents in your S3 bucket are ingested and indexed by the knowledge base.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Go to your Knowledge Base&lt;/strong&gt;: Navigate back to the Amazon Bedrock console and select your newly created Knowledge Base.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Navigate to Data Source Tab&lt;/strong&gt;: Within your Knowledge Base details, click on the &lt;strong&gt;Data source&lt;/strong&gt; tab.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Select Your Data Source&lt;/strong&gt;: From the list of data sources, select the one you configured in the previous step.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Initiate Sync&lt;/strong&gt;: Click on the &lt;strong&gt;"Sync"&lt;/strong&gt; button.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: Similar to the creation process, syncing the data source with your Knowledge Base can take some time, especially for large datasets. Monitor the status in the console until the synchronization is complete.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Test Your Knowledge Base
&lt;/h3&gt;

&lt;p&gt;Before proceeding with formal evaluations, it's a good practice to manually test your knowledge base to get a preliminary understanding of its retrieval capabilities. This helps in identifying any immediate issues with data ingestion or relevance.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Navigate to Test Knowledge Base&lt;/strong&gt;: In the Amazon Bedrock console, go to your Knowledge Base and select the &lt;strong&gt;Test Knowledge Base&lt;/strong&gt; tab.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Select Model&lt;/strong&gt;: Choose &lt;strong&gt;"Amazon Nova Micro"&lt;/strong&gt; as the model for testing.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Enter Questions&lt;/strong&gt;: In the chat interface, enter specific questions related to the data you ingested into your knowledge base. For example, if your data contains information about product service intervals, you might ask: "What is the recommended service interval for your product?"&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Review Responses&lt;/strong&gt;: Carefully review the responses provided by the knowledge base. Verify that they are accurate, relevant, and directly supported by your source data. Pay attention to whether the responses correctly retrieve information from your documents.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Iterative Testing&lt;/strong&gt;: Try different types of questions, including those that require precise factual recall and those that involve more general understanding. This iterative testing helps you gauge the breadth and depth of your knowledge base's retrieval capabilities.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Tip&lt;/strong&gt;: To ensure the Knowledge Base is retrieving relevant information correctly, ask specific questions that can be directly answered by the content within your data sources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Creating Evaluation Examples
&lt;/h3&gt;

&lt;p&gt;To automatically evaluate your RAG system, you need a dataset of evaluation examples. These examples consist of prompts and their corresponding reference responses, which the evaluation job will use to assess the quality of your knowledge base's retrieval. This process involves creating a &lt;code&gt;batchinput.jsonl&lt;/code&gt; file.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Copy Example for a Single Record&lt;/strong&gt;: Begin by visiting the official AWS documentation for prompt retrieve examples [1]. This documentation provides a structured JSON format for evaluation inputs. Copy an input record example, ensuring to remove any extraneous spaces to maintain valid JSONL formatting. A typical example might look like this:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"conversationTurns"&lt;/span&gt;&lt;span class="p"&gt;:[{&lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:[{&lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"What is the recommended service interval?"&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="nl"&gt;"referenceResponses"&lt;/span&gt;&lt;span class="p"&gt;:[{&lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:[{&lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"The recommended service interval is two years."&lt;/span&gt;&lt;span class="p"&gt;}]}]}]}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create More Examples&lt;/strong&gt;: Manually creating a large number of diverse evaluation examples can be time-consuming. To expedite this process, you can leverage tools like &lt;strong&gt;Amazon Q Developer (Free version)&lt;/strong&gt; to generate additional samples. Focus on creating a variety of prompts that cover different aspects of your knowledge base content and expected user queries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Save All Records to &lt;code&gt;batchinput.jsonl&lt;/code&gt;&lt;/strong&gt;: Consolidate all your generated evaluation examples into a single file named &lt;code&gt;batchinput.jsonl&lt;/code&gt;. Each line in this file must be a valid JSON object, representing one evaluation example. Ensure the file adheres strictly to the JSONL (JSON Lines) format, where each line is a self-contained JSON object, without commas between objects or an enclosing array.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: It is crucial that your &lt;code&gt;batchinput.jsonl&lt;/code&gt; file is correctly formatted. You can use online JSON formatters and validators like &lt;a href="https://jsonformatter.org/" rel="noopener noreferrer"&gt;jsonformatter.org&lt;/a&gt; or &lt;a href="https://jsonlint.com/" rel="noopener noreferrer"&gt;jsonlint.com&lt;/a&gt; to verify its integrity before proceeding.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 5: Upload the File to S3
&lt;/h3&gt;

&lt;p&gt;With your &lt;code&gt;batchinput.jsonl&lt;/code&gt; file prepared, the next step is to upload it to an Amazon S3 bucket. This S3 location will serve as the input for your RAG evaluation job in Amazon Bedrock.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Prepare Your &lt;code&gt;batchinput.jsonl&lt;/code&gt; File&lt;/strong&gt;: Ensure your file contains all the evaluation examples and is correctly formatted as JSONL, as detailed in the previous step.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Navigate to the AWS S3 Console&lt;/strong&gt;: In the AWS Management Console, search for and select "S3."&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Select Your S3 Bucket&lt;/strong&gt;: Locate and select the S3 bucket you intend to use for storing your evaluation input (e.g., &lt;code&gt;mybatchinferenceinput&lt;/code&gt;). If you don't have a dedicated bucket, you may need to create one.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Initiate Upload&lt;/strong&gt;: Click on the &lt;strong&gt;"Upload"&lt;/strong&gt; button.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Select Your File&lt;/strong&gt;: Drag and drop your &lt;code&gt;batchinput.jsonl&lt;/code&gt; file into the upload area, or use the "Add files" button to browse and select it from your local machine.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Review and Confirm&lt;/strong&gt;: Review the upload settings. For evaluation input files, default settings are usually sufficient, but ensure public access is not inadvertently granted if your data is sensitive.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Complete Upload&lt;/strong&gt;: Click &lt;strong&gt;"Upload"&lt;/strong&gt; to finalize the process.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: Double-check that your &lt;code&gt;batchinput.jsonl&lt;/code&gt; file is in the correct JSONL format with no extra spaces or malformed JSON objects. Incorrect formatting can lead to errors during the evaluation job processing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Create an Evaluation Job
&lt;/h3&gt;

&lt;p&gt;Now that your evaluation examples are ready and uploaded to S3, you can create an evaluation job in Amazon Bedrock to automatically assess your knowledge base.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Navigate to Amazon Bedrock Evaluations&lt;/strong&gt;: In the AWS Management Console, go to Amazon Bedrock. In the left-hand navigation pane, select &lt;strong&gt;Inference and Assessment&lt;/strong&gt;, then &lt;strong&gt;Evaluations&lt;/strong&gt;, and finally &lt;strong&gt;RAG&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Create New Evaluation&lt;/strong&gt;: Click on the &lt;strong&gt;"Create"&lt;/strong&gt; button to start configuring a new evaluation job.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Provide Evaluation Details&lt;/strong&gt;: 

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Evaluation name&lt;/strong&gt;: Enter a unique and descriptive name for your evaluation job.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Description&lt;/strong&gt;: (Optional) Provide a brief description of the evaluation.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Select Evaluator Model&lt;/strong&gt;: Choose &lt;strong&gt;"Amazon Nova Micro v1.0"&lt;/strong&gt; as the evaluator model. This model will be used to automatically score the responses generated by your knowledge base against the reference responses you provided.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Specify Source&lt;/strong&gt;: Select &lt;strong&gt;"Bedrock Knowledge Base"&lt;/strong&gt; as the source for the evaluation. Then, choose your specific Knowledge Base (e.g., &lt;code&gt;myFirstBedrockKB&lt;/code&gt;) from the dropdown list.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Define Evaluation Type and Metrics&lt;/strong&gt;: 

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Evaluation type&lt;/strong&gt;: Select &lt;strong&gt;"Retrieval only"&lt;/strong&gt;. This focuses the evaluation on the quality of the information retrieved by your knowledge base.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Metrics&lt;/strong&gt;: Under the Metrics section, select &lt;strong&gt;"Context relevance"&lt;/strong&gt; and &lt;strong&gt;"Context coverage"&lt;/strong&gt;. These are crucial metrics for assessing how well the retrieved context aligns with the prompt and how comprehensively it covers the necessary information.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Configure Input and Output Locations&lt;/strong&gt;: 

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Input&lt;/strong&gt;: Specify the S3 location of your &lt;code&gt;batchinput.jsonl&lt;/code&gt; file (e.g., &lt;code&gt;s3://mybatchinferenceinput/batchinput.jsonl&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output&lt;/strong&gt;: Choose an S3 output bucket and prefix where the evaluation results will be stored (e.g., &lt;code&gt;s3://mymodelevaloutput/output&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Set up Service Role&lt;/strong&gt;: &lt;strong&gt;Create and use a new service role&lt;/strong&gt; for this evaluation job. This role grants Bedrock the necessary permissions to access your S3 buckets and run the evaluation. Remember to note down the role's name for future reference.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Initiate Evaluation&lt;/strong&gt;: Review all your settings and click &lt;strong&gt;"Create evaluation job"&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once the evaluation job is created, Amazon Bedrock will begin processing your evaluation examples, using the chosen evaluator model to score the retrieval performance of your knowledge base. You can monitor the job's status in the Bedrock console.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitoring and Detailed Analysis
&lt;/h2&gt;

&lt;p&gt;After setting up and running your RAG evaluation job, monitoring its performance and conducting detailed analysis of the results are crucial steps. This allows you to gain insights into the efficiency and effectiveness of your knowledge base. The following steps, illustrated in the provided flowchart, guide you through this process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Prerequisites for Monitoring
&lt;/h3&gt;

&lt;p&gt;Before you can effectively monitor and analyze your RAG evaluations, ensure you have the following foundational elements in place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Amazon Bedrock and Bedrock Knowledge Base&lt;/strong&gt;: Your RAG application, including the Amazon Bedrock service and your configured Knowledge Base, must be operational.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Prompt Dataset in S3&lt;/strong&gt;: The &lt;code&gt;batchinput.jsonl&lt;/code&gt; file containing your evaluation prompts and reference responses should be stored in an accessible S3 bucket, as this is the input for your evaluation jobs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 2: Enable Logging
&lt;/h3&gt;

&lt;p&gt;To capture the necessary metrics and logs for monitoring and detailed analysis, you must enable logging for your Bedrock evaluations. This ensures that invocation details and other critical information are recorded.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Navigate to Bedrock Evaluations&lt;/strong&gt;: Go to the Amazon Bedrock console, then &lt;strong&gt;Inference and Assessment&lt;/strong&gt;, and select &lt;strong&gt;Evaluations&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Enable Model Invocation Logging&lt;/strong&gt;: Within the evaluation settings, ensure that &lt;strong&gt;Model Invocation Logging&lt;/strong&gt; is enabled. This setting directs Bedrock to send invocation data to a logging service.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Choose S3/CloudWatch Logs&lt;/strong&gt;: Configure where these logs should be stored. You can choose to send them to &lt;strong&gt;Amazon S3&lt;/strong&gt; for long-term storage and batch analysis, or to &lt;strong&gt;Amazon CloudWatch Logs&lt;/strong&gt; for real-time monitoring and querying.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 3: Create Evaluation (Recap)
&lt;/h3&gt;

&lt;p&gt;As previously detailed, the creation of the evaluation job is where you define what to evaluate and how. This step is a prerequisite for the monitoring phase.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Go to Bedrock Evaluations&lt;/strong&gt;: Access the Evaluations section in Amazon Bedrock.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Create Knowledge Base Evaluation Job&lt;/strong&gt;: Initiate the creation of a new evaluation job, specifying it as a Knowledge Base evaluation.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Configure Job Settings&lt;/strong&gt;: Define the evaluation name, description, and select the evaluator model (e.g., Amazon Nova Micro v1.0).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Specify Prompt Dataset &amp;amp; S3 Output&lt;/strong&gt;: Point to your &lt;code&gt;batchinput.jsonl&lt;/code&gt; file in S3 as the input and define an S3 bucket for storing the evaluation output.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Click Create Evaluation Job&lt;/strong&gt;: Launch the evaluation process.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 4: Monitor with CloudWatch
&lt;/h3&gt;

&lt;p&gt;Amazon CloudWatch provides powerful tools for monitoring your Bedrock evaluations in real-time. You can use CloudWatch dashboards to visualize key performance indicators.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Open CloudWatch Console&lt;/strong&gt;: In the AWS Management Console, search for and select "CloudWatch."&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Go to Automatic Dashboards&lt;/strong&gt;: In the CloudWatch console, navigate to the &lt;strong&gt;Dashboards&lt;/strong&gt; section and look for automatically generated dashboards.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Select Bedrock Dashboard&lt;/strong&gt;: Choose the dashboard specifically created for Amazon Bedrock. This dashboard typically provides pre-configured widgets for common Bedrock metrics.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;View InvocationLatency Metrics&lt;/strong&gt;: Within the Bedrock dashboard, focus on metrics such as &lt;code&gt;InvocationLatency&lt;/code&gt;. This metric indicates the total response time of your knowledge base, which is critical for user experience.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Filter by Model ID&lt;/strong&gt;: To narrow down your analysis, you can filter the metrics by &lt;code&gt;Model ID&lt;/code&gt;. This allows you to observe the performance of specific models used in your RAG evaluations.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 5: Analyze Detailed with CloudWatch Logs Insights
&lt;/h3&gt;

&lt;p&gt;For a deeper dive into individual evaluation runs and to troubleshoot specific issues, CloudWatch Logs Insights offers a powerful query language to analyze your raw logs.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Go to CloudWatch Logs Insights&lt;/strong&gt;: In the CloudWatch console, navigate to &lt;strong&gt;Logs&lt;/strong&gt; and then select &lt;strong&gt;Logs Insights&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Query for Individual Customer Metrics&lt;/strong&gt;: Use the Logs Insights query editor to write custom queries that extract specific information from your Bedrock evaluation logs. You can query for details related to individual prompts, responses, and the metrics computed by the evaluator model.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Analyze Raw Logs from S3&lt;/strong&gt;: If you configured your logs to be stored in S3, you can also directly access and analyze these raw log files using tools like Amazon Athena or other data processing services for more complex, large-scale analysis.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By following these monitoring and analysis steps, you can continuously track the performance of your RAG system, identify areas for improvement, and ensure your knowledge base is delivering optimal results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Performance Metrics
&lt;/h2&gt;

&lt;p&gt;Understanding the performance of your RAG system involves analyzing several key metrics that provide insights into its efficiency and effectiveness. These metrics are crucial for identifying bottlenecks, optimizing costs, and ensuring a high-quality user experience. The primary metrics to focus on include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;InvocationLatency&lt;/code&gt;&lt;/strong&gt;: This metric represents the &lt;strong&gt;total response time&lt;/strong&gt; of your RAG system. It measures the duration from when a request is made to the knowledge base until a response is fully generated. Lower invocation latency indicates a more responsive system, which is vital for interactive applications. High latency can point to issues with network connectivity, model inference speed, or knowledge base retrieval efficiency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;InputTokenCount&lt;/code&gt;&lt;/strong&gt;: This metric tracks the &lt;strong&gt;number of tokens in the input&lt;/strong&gt; provided to the LLM. In a RAG context, this typically includes the user's query and the retrieved context from the knowledge base. Monitoring input token count helps in understanding the complexity of the prompts being processed and has direct implications for cost, as most LLM providers charge based on token usage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;OutputTokenCount&lt;/code&gt;&lt;/strong&gt;: This metric measures the &lt;strong&gt;number of tokens in the output&lt;/strong&gt; generated by the LLM. It reflects the length and verbosity of the responses. Similar to input tokens, output token count is a significant factor in determining the operational cost of your RAG application. Optimizing the conciseness and relevance of responses can help manage this cost.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;Invocations&lt;/code&gt;&lt;/strong&gt;: This metric quantifies the &lt;strong&gt;number of successful requests&lt;/strong&gt; made to the &lt;code&gt;InvokeModel&lt;/code&gt; and &lt;code&gt;InvokeModelWithResponseStream&lt;/code&gt; API operations. It provides a direct measure of the usage volume of your RAG system. Tracking invocations helps in capacity planning, understanding demand patterns, and correlating usage with overall system performance and cost.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By regularly monitoring and analyzing these key performance metrics, you can gain a comprehensive understanding of your RAG system's behavior, identify areas for optimization, and make data-driven decisions to improve its efficiency and user satisfaction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Considerations
&lt;/h2&gt;

&lt;p&gt;When deploying and evaluating RAG systems on Amazon Bedrock, understanding the cost implications of different models is paramount. Pricing for LLMs is typically based on token usage, with separate rates for input and output tokens. The choice of model can significantly impact your operational expenses. Below is a table summarizing the pricing for various models available in Amazon Bedrock, based on the provided data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model Provider&lt;/th&gt;
&lt;th&gt;Model Name&lt;/th&gt;
&lt;th&gt;Input Price (per 1K tokens)&lt;/th&gt;
&lt;th&gt;Output Price (per 1K tokens)&lt;/th&gt;
&lt;th&gt;Region&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Amazon&lt;/td&gt;
&lt;td&gt;Nova Micro&lt;/td&gt;
&lt;td&gt;$0.000035&lt;/td&gt;
&lt;td&gt;$0.000140&lt;/td&gt;
&lt;td&gt;us-east-1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon&lt;/td&gt;
&lt;td&gt;Nova Lite&lt;/td&gt;
&lt;td&gt;$0.000060&lt;/td&gt;
&lt;td&gt;$0.000240&lt;/td&gt;
&lt;td&gt;us-east-1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon&lt;/td&gt;
&lt;td&gt;Nova Pro&lt;/td&gt;
&lt;td&gt;$0.000800&lt;/td&gt;
&lt;td&gt;$0.003200&lt;/td&gt;
&lt;td&gt;us-east-1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;Claude 4.0 Sonne&lt;/td&gt;
&lt;td&gt;$0.003000&lt;/td&gt;
&lt;td&gt;$0.015000&lt;/td&gt;
&lt;td&gt;us-east-1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;td&gt;Llama 3 70B&lt;/td&gt;
&lt;td&gt;$0.000720&lt;/td&gt;
&lt;td&gt;$0.000720&lt;/td&gt;
&lt;td&gt;us-east-1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;As you can observe from the table, there is a considerable variation in pricing across different models. For instance, Amazon Nova Micro offers a very cost-effective option for both input and output tokens, making it suitable for initial evaluations and applications where cost efficiency is a primary concern. In contrast, models like Anthropic Claude 4.0 Sonne, while potentially offering advanced capabilities, come with a significantly higher price point.&lt;/p&gt;

&lt;p&gt;When selecting a model for your RAG application and its evaluations, it is crucial to balance performance requirements with budgetary constraints. Consider the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Evaluation Frequency&lt;/strong&gt;: Frequent evaluations will incur costs based on the number of tokens processed. Opting for more cost-effective models for evaluation jobs can help manage expenses.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Production Workloads&lt;/strong&gt;: For production deployments, assess the expected volume of input and output tokens to project monthly costs. A small difference in per-token pricing can accumulate into substantial costs at scale.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Model Performance vs. Cost&lt;/strong&gt;: While cheaper models might seem attractive, ensure they meet your performance benchmarks for accuracy, relevance, and latency. Sometimes, investing in a slightly more expensive model that delivers superior results can lead to better overall ROI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By carefully analyzing these cost factors alongside performance metrics, you can make informed decisions about model selection and optimize the financial efficiency of your RAG solutions on Amazon Bedrock.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluation Results Interpretation
&lt;/h2&gt;

&lt;p&gt;Interpreting the results of your RAG evaluations is key to understanding the strengths and weaknesses of different models and optimizing your knowledge base. The provided spreadsheet image (&lt;code&gt;pasted_file_ekW3hG_image.png&lt;/code&gt;) offers a comprehensive comparison across several models, highlighting various performance and quality metrics. Let's break down how to interpret such a detailed evaluation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Overview of Models and Performance Metrics
&lt;/h3&gt;

&lt;p&gt;The evaluation typically compares several models, such as &lt;strong&gt;Nova Micro&lt;/strong&gt;, &lt;strong&gt;Nova Lite&lt;/strong&gt;, &lt;strong&gt;Nova Pro&lt;/strong&gt;, &lt;strong&gt;Claude 4.0 Sonne&lt;/strong&gt;, and &lt;strong&gt;Llama 3 70B&lt;/strong&gt;. For each model, several performance metrics are usually captured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Input&lt;/strong&gt;: This likely refers to the total number of input tokens processed during the evaluation run. A higher number indicates more extensive testing or longer prompts/contexts.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Throughput&lt;/strong&gt;: This metric measures the processing speed, often expressed as tokens per second or invocations per second. Higher throughput indicates a more efficient model capable of handling a larger volume of requests in a given time frame.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cost&lt;/strong&gt;: This is a critical metric, often broken down into cost per second, cost per invocation, and cost per 1K tokens. As discussed in the previous section, these figures directly reflect the financial implications of using each model for your RAG system. Lower costs are generally desirable, provided the quality remains acceptable.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quality Metrics
&lt;/h3&gt;

&lt;p&gt;The core of RAG evaluation lies in assessing the quality of the generated responses. The spreadsheet categorizes quality into several dimensions, each contributing to a holistic view of model performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Correctness&lt;/strong&gt;: Measures whether the generated response is factually accurate and free from errors. This is paramount for RAG systems, as their purpose is to provide grounded information.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Completeness&lt;/strong&gt;: Assesses if the response addresses all aspects of the user's query and provides sufficient information. An incomplete response, even if correct, may not be helpful.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Helpfulness&lt;/strong&gt;: Evaluates how useful and actionable the response is to the user. A helpful response goes beyond mere correctness to provide practical value.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Coherence&lt;/strong&gt;: Determines if the response is logically structured, easy to understand, and flows naturally. A coherent response enhances user experience.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Harmfulness&lt;/strong&gt;: Identifies if the response contains any toxic, biased, or otherwise inappropriate content. This is a crucial safety metric for all LLM applications.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Groundedness&lt;/strong&gt;: This is particularly important for RAG systems. It verifies that all information presented in the response can be directly traced back to the provided source documents (the knowledge base). A high groundedness score indicates that the LLM is effectively utilizing the retrieved context and not hallucinating information.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these quality metrics is typically scored, often on a scale (e.g., 0 to 1, or 0 to 5), with higher scores indicating better performance in that specific dimension.&lt;/p&gt;

&lt;h3&gt;
  
  
  Weighted Composite Score and Final Ranking
&lt;/h3&gt;

&lt;p&gt;To provide an overall assessment, a &lt;strong&gt;Weighted Composite Score&lt;/strong&gt; is often calculated. This score combines the individual quality metrics, allowing you to assign different weights based on the importance of each metric to your specific application. For example, if correctness and groundedness are more critical for your use case, they would receive higher weights. The formula for this composite score is usually defined within the evaluation setup.&lt;/p&gt;

&lt;p&gt;Finally, a &lt;strong&gt;Final Ranking Calculation&lt;/strong&gt; provides an ordered list of models based on their overall performance, considering both quantitative metrics (like latency and cost) and qualitative metrics (like correctness and groundedness). This ranking helps in making informed decisions about which model is best suited for your RAG application, balancing performance, quality, and cost.&lt;/p&gt;

&lt;p&gt;By meticulously analyzing these metrics, you can identify which models excel in certain areas, pinpoint specific weaknesses, and iteratively refine your knowledge base, prompt engineering, or even the underlying LLM choice to achieve optimal RAG performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Important Notes &amp;amp; Reminders
&lt;/h2&gt;

&lt;p&gt;As you embark on your RAG evaluation journey in Amazon Bedrock, keep the following important notes and reminders in mind to ensure efficient resource management, security, and best practices.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resource Cleanup
&lt;/h3&gt;

&lt;p&gt;AWS services, especially those involving machine learning models and data storage, can incur significant costs if left running unnecessarily. It is &lt;strong&gt;highly recommended&lt;/strong&gt; that you diligently delete or release all resources after completing your lab work or evaluations. This includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Knowledge Base&lt;/strong&gt;: The Amazon Bedrock Knowledge Base itself.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Vector Database&lt;/strong&gt;: The underlying vector store, whether it's Amazon OpenSearch Serverless or Amazon S3 Vector Store.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;S3 Buckets&lt;/strong&gt;: Any S3 buckets you created or used for storing source data, evaluation inputs, or outputs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;IAM Roles&lt;/strong&gt;: The IAM roles created for the Knowledge Base and evaluation jobs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Failing to clean up resources can lead to unexpected charges on your AWS bill. Always verify that all associated resources have been terminated or deleted.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross-Origin Resource Sharing (CORS)
&lt;/h3&gt;

&lt;p&gt;When developing web applications that interact with Amazon Bedrock, you might encounter issues related to Cross-Origin Resource Sharing (CORS). CORS is a security feature implemented by web browsers that restricts web pages from making requests to a different domain than the one that served the web page. If your frontend application is hosted on a different domain than your Bedrock API endpoints, you will need to configure CORS policies.&lt;/p&gt;

&lt;p&gt;For detailed information on how to configure CORS with Amazon Bedrock, please refer to the official AWS documentation: &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/cors.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/bedrock/latest/userguide/cors.html&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  JSON Formatters
&lt;/h3&gt;

&lt;p&gt;Throughout the process of creating evaluation examples, you will be working with JSONL files. Ensuring that your JSON objects are correctly formatted is crucial for the evaluation jobs to run successfully. Malformed JSON can lead to errors and failed evaluations.&lt;/p&gt;

&lt;p&gt;Several online tools can help you validate and format your JSON content. Some popular options include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://jsonformatter.org/" rel="noopener noreferrer"&gt;https://jsonformatter.org/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://jsonlint.com/" rel="noopener noreferrer"&gt;https://jsonlint.com/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tools can help you quickly identify syntax errors, pretty-print your JSON for readability, and ensure compliance with the JSON standard.&lt;/p&gt;

&lt;p&gt;By adhering to these important notes and reminders, you can maintain a secure, cost-effective, and efficient environment for your RAG evaluations in Amazon Bedrock.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Evaluating Retrieval Augmented Generation (RAG) systems is not merely a best practice; it is a critical component for ensuring the reliability, accuracy, and cost-effectiveness of your AI applications. This guide has provided a comprehensive walkthrough of how to leverage Amazon Bedrock's evaluation capabilities to automatically assess your knowledge base performance. From setting up your environment and creating evaluation examples to monitoring key metrics and interpreting detailed results, you now have the knowledge to systematically enhance your RAG solutions.&lt;/p&gt;

&lt;p&gt;By diligently following these steps, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Improve Response Quality&lt;/strong&gt;: Continuously refine your knowledge base and model choices to deliver more accurate, complete, and helpful responses.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Optimize Costs&lt;/strong&gt;: Make informed decisions about model selection based on performance and pricing, ensuring your RAG system operates efficiently within budget.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Enhance User Experience&lt;/strong&gt;: Reduce latency and improve the relevance of information, leading to a more satisfying experience for end-users.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Maintain System Health&lt;/strong&gt;: Proactively identify and address issues through continuous monitoring and detailed analysis of performance metrics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We encourage you to implement these practices within your own AWS environment. The journey of building robust AI applications is iterative, and effective evaluation is the compass that guides you toward excellence. Start evaluating your RAG systems today to unlock their full potential and deliver truly intelligent solutions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding Retrieval Augmented Generation (RAG)
&lt;/h3&gt;

&lt;p&gt;To better understand the evaluation process, it's helpful to visualize the core components of a RAG system. The diagram below illustrates the typical flow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fscvffvllweigv94oo9br.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fscvffvllweigv94oo9br.png" alt="RAG Concept Diagram" width="800" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this flow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;User Query&lt;/strong&gt;: The user initiates a request or question.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Retrieval&lt;/strong&gt;: The RAG system queries a &lt;strong&gt;Knowledge Base&lt;/strong&gt; (an external data source) to retrieve relevant information based on the user's query.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Generation&lt;/strong&gt;: The retrieved information is then passed to a &lt;strong&gt;Large Language Model (LLM)&lt;/strong&gt;, which uses this context to generate a comprehensive and grounded response.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Response&lt;/strong&gt;: The final generated response is presented to the user.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This process ensures that the LLM's output is informed by up-to-date and specific data, making evaluations of both the retrieval and generation components critical.&lt;/p&gt;

&lt;h3&gt;
  
  
  RAG Evaluation Workflow Overview
&lt;/h3&gt;

&lt;p&gt;The entire process of setting up, running, and analyzing RAG evaluations in Amazon Bedrock can be visualized as a clear workflow. The following flowchart provides a high-level overview of the steps involved, from initial prerequisites to detailed analysis:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4ak1i36n2llre4b8567.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4ak1i36n2llre4b8567.png" alt="RAG Evaluation Flowchart" width="800" height="1517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This visual guide helps in understanding the sequence of operations and the interdependencies between different stages of the evaluation process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Visualizing Key Performance Metrics
&lt;/h3&gt;

&lt;p&gt;To further clarify the key performance metrics discussed, the following diagram illustrates their relationships and what they represent:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4a2dnbkb8j9dnlgckxv1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4a2dnbkb8j9dnlgckxv1.png" alt="Key Performance Metrics Diagram" width="800" height="689"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These metrics provide a quantitative foundation for assessing the efficiency and responsiveness of your RAG system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Visualizing Model Pricing
&lt;/h3&gt;

&lt;p&gt;To provide a clear overview of the cost differences, the following image illustrates the pricing structure for various models:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlyokm48mief3faeel7d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlyokm48mief3faeel7d.png" alt="Amazon Bedrock Model Pricing" width="800" height="147"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This visual representation emphasizes the importance of cost-conscious model selection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Detailed Evaluation Results
&lt;/h3&gt;

&lt;p&gt;The following image provides a detailed breakdown of evaluation results across different models, showcasing various performance and quality metrics:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc404ekazvq1ga8p15zyr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc404ekazvq1ga8p15zyr.png" alt="RAG Evaluation Results Spreadsheet" width="800" height="44"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This spreadsheet is instrumental in conducting a thorough comparative analysis of model performance.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>bedrock</category>
    </item>
    <item>
      <title>Deep Dive on Amazon Aurora and Amazon RDS for PostgreSQL Architecture and Features</title>
      <dc:creator>Raj Murugan</dc:creator>
      <pubDate>Fri, 12 Dec 2025 12:49:56 +0000</pubDate>
      <link>https://forem.com/rajmurugan/deep-dive-on-amazon-aurora-and-amazon-rds-for-postgresql-architecture-and-features-182a</link>
      <guid>https://forem.com/rajmurugan/deep-dive-on-amazon-aurora-and-amazon-rds-for-postgresql-architecture-and-features-182a</guid>
      <description>&lt;h1&gt;
  
  
  Deep Dive on Amazon Aurora and Amazon RDS for PostgreSQL Architecture and Features
&lt;/h1&gt;




&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;If you're considering migrating your self-hosted PostgreSQL database or transitioning your commercial databases to PostgreSQL on AWS, you'll need to choose the database service that best aligns with your requirements. AWS offers two managed PostgreSQL database options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Amazon Aurora PostgreSQL-Compatible Edition&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Amazon Relational Database Service (Amazon RDS) for PostgreSQL&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post delves into the architecture and features of Aurora PostgreSQL and RDS PostgreSQL, analyzing their performance, scalability, failover capabilities, storage options, high availability, and disaster recovery mechanisms.&lt;/p&gt;




&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;Both Aurora PostgreSQL and RDS for PostgreSQL are fully managed PostgreSQL database services offering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provisioning various classes of DB instances&lt;/li&gt;
&lt;li&gt;Multiple PostgreSQL-compatible versions&lt;/li&gt;
&lt;li&gt;Managing backups and point-in-time recovery (PITR)&lt;/li&gt;
&lt;li&gt;Replication and monitoring&lt;/li&gt;
&lt;li&gt;Multi-AZ support&lt;/li&gt;
&lt;li&gt;Storage auto scaling&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Differences
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Aurora PostgreSQL&lt;/strong&gt; uses a high-performance storage subsystem customized for fast distributed storage. The underlying storage grows automatically in segments of 10 GiB, up to 128 TiB. Aurora improves upon PostgreSQL for massive throughput and highly concurrent workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RDS for PostgreSQL&lt;/strong&gt; supports up to 64 TiB of storage and uses Amazon Elastic Block Store (Amazon EBS) volumes for database and log storage. RDS manages PostgreSQL installation, upgrades, storage management, replication, and backups.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Comparison
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Aurora PostgreSQL Architecture
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Single virtual cluster volume supported by storage nodes using locally attached SSDs&lt;/li&gt;
&lt;li&gt;Data automatically replicated across three Availability Zones&lt;/li&gt;
&lt;li&gt;Shared storage model for writer and readers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  RDS PostgreSQL Architecture
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Classic Multi-AZ with single standby instance&lt;/li&gt;
&lt;li&gt;Multi-AZ DB cluster with two readable standby DB instances (semi-synchronous)&lt;/li&gt;
&lt;li&gt;Three separate Availability Zones for increased read capacity&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Feature Comparison Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Aurora PostgreSQL&lt;/th&gt;
&lt;th&gt;RDS for PostgreSQL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maximum Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;128 TiB&lt;/td&gt;
&lt;td&gt;64 TiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Custom distributed storage (locally attached SSDs)&lt;/td&gt;
&lt;td&gt;Amazon EBS (gp2/gp3, io1/io2)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage Growth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Automatic in 10 GiB increments&lt;/td&gt;
&lt;td&gt;Auto scaling in 10 GiB or 10% chunks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage Reduction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Automatic when data deleted&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IOPS Limitation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No limitation based on storage size&lt;/td&gt;
&lt;td&gt;Depends on storage type and size&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;I/O Charges&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Separate (I/O-Optimized available)&lt;/td&gt;
&lt;td&gt;Included with storage type&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Read Replicas&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Up to 15 Aurora readers&lt;/td&gt;
&lt;td&gt;Up to 155 read replicas (5 per instance, 3 levels of cascading)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-Region Replicas&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Aurora Global Database&lt;/td&gt;
&lt;td&gt;5 cross-Region read replicas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Typical Replica Lag&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Few hundred milliseconds&lt;/td&gt;
&lt;td&gt;Few seconds (optimal) to minutes (high load)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backup Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Continuous and incremental&lt;/td&gt;
&lt;td&gt;Daily full + continuous WAL archiving&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backup Performance Impact&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Slight impact on single-AZ deployments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PITR Restore Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fast (incremental nature)&lt;/td&gt;
&lt;td&gt;Slower (restore full + replay WALs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Failover Time (Multi-AZ)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;30 seconds (DNS: 10-15s, Recovery: 3-10s)&lt;/td&gt;
&lt;td&gt;1-2 minutes (includes crash recovery)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Crash Recovery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Immediate (on-demand parallel replay)&lt;/td&gt;
&lt;td&gt;Depends on checkpoint interval (default 5 min)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-AZ Options&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single configuration&lt;/td&gt;
&lt;td&gt;One standby or two readable standbys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Write Latency (Multi-AZ)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;Up to 2x faster with two standbys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Replication Method&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Shared storage&lt;/td&gt;
&lt;td&gt;PostgreSQL streaming replication&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Write Impact on Replicas&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Negligible&lt;/td&gt;
&lt;td&gt;Significant (processes transaction logs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Replication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6 copies across 3 AZs&lt;/td&gt;
&lt;td&gt;Synchronous to standby, async to replicas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Serverless Option&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Aurora Serverless v2&lt;/td&gt;
&lt;td&gt;Not available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fast Database Cloning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No (snapshot restore only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Plan Management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (QPM)&lt;/td&gt;
&lt;td&gt;Not available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cluster Cache Management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (warm cache failover)&lt;/td&gt;
&lt;td&gt;Not available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Machine Learning Integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (native SQL)&lt;/td&gt;
&lt;td&gt;Not available&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Detailed Feature Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Storage
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Aurora PostgreSQL Storage
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single virtual cluster volume&lt;/strong&gt; supported by storage nodes using locally attached SSDs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic growth&lt;/strong&gt; in 10 GiB increments up to 128 TiB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic reduction&lt;/strong&gt; when data is deleted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Triple replication&lt;/strong&gt; across three Availability Zones automatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No IOPS limitation&lt;/strong&gt; based on storage size (may need to scale DB instance)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Separate I/O charges&lt;/strong&gt; applied per usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I/O-Optimized configuration&lt;/strong&gt; provides up to 40% cost savings when I/O spend exceeds 25% of Aurora database spend&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  RDS for PostgreSQL Storage
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon EBS SSD-based storage types:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;General Purpose SSD (gp2):&lt;/strong&gt; 3 IOPS per provisioned GiB, burst up to 3,000 IOPS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;General Purpose SSD (gp3):&lt;/strong&gt; Customized performance independent of size&lt;/li&gt;
&lt;li&gt;Baseline: 3,000 IOPS and 125 MiB/s for &amp;lt;400 GiB storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provisioned IOPS (io1, io2):&lt;/strong&gt; 1,000–256,000 IOPS range&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Storage auto scaling&lt;/strong&gt; in chunks of 10 GiB or 10% of current storage (whichever is greater)&lt;/li&gt;

&lt;/ul&gt;




&lt;h3&gt;
  
  
  Backup
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Aurora PostgreSQL Backup
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Continuous and incremental&lt;/strong&gt; automated backups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No performance impact&lt;/strong&gt; or interruption during backups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast PITR&lt;/strong&gt; due to incremental nature&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restore time&lt;/strong&gt; depends on volume size and transaction log count&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  RDS for PostgreSQL Backup
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Daily automated backups&lt;/strong&gt; during backup window&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slight performance impact&lt;/strong&gt; on single-AZ deployments when backup initiates&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Continuous WAL archiving&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PITR process:&lt;/strong&gt; Restore full backup + replay WALs to desired time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slower for write-intensive workloads&lt;/strong&gt; (long WAL replay time)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tip:&lt;/strong&gt; Frequent manual snapshots reduce PITR duration&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Scalability
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Aurora PostgreSQL Scalability
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Up to 15 readers&lt;/strong&gt; for read scaling and high availability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared storage model&lt;/strong&gt; minimizes impact of high write workloads on replication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal replica lag&lt;/strong&gt; (few hundred milliseconds, occasionally up to 60s)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-restart&lt;/strong&gt; of readers if lag exceeds threshold&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write capacity&lt;/strong&gt; limited by single writer instance&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  RDS for PostgreSQL Scalability
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Up to 155 read replicas&lt;/strong&gt; (5 per instance, 3 cascading levels)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cascading architecture&lt;/strong&gt; reduces overhead on source instance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Progressive replication lag&lt;/strong&gt; with more intermediaries in cascade&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read replica promotion&lt;/strong&gt; to standalone instances&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;5 cross-Region read replicas&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming replication&lt;/strong&gt; via PostgreSQL WAL records&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Higher replica lag risk&lt;/strong&gt; with high write activity, storage/instance class mismatch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two readable standbys&lt;/strong&gt; in Multi-AZ three-AZ deployment serve both HA and scalability&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Crash Recovery
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Aurora PostgreSQL
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No traditional checkpoints&lt;/strong&gt; (storage system applies log records directly)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel and asynchronous&lt;/strong&gt; redo record replay per storage segment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Immediate availability&lt;/strong&gt; after crash&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  RDS for PostgreSQL
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Replays transaction logs&lt;/strong&gt; since last checkpoint (default: 5 minutes apart)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checkpoint process&lt;/strong&gt; writes dirty pages from memory to storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trade-off:&lt;/strong&gt; Frequent checkpoints reduce recovery time but may slow performance (I/O intensive)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Failover
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Aurora PostgreSQL Failover
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Typical failover time:&lt;/strong&gt; 30 seconds

&lt;ul&gt;
&lt;li&gt;DNS propagation: 10-15 seconds&lt;/li&gt;
&lt;li&gt;Recovery: 3-10 seconds (parallel with DNS)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Automatic promotion&lt;/strong&gt; of reader to primary&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  RDS for PostgreSQL Failover
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Typical failover time:&lt;/strong&gt; 1-2 minutes

&lt;ul&gt;
&lt;li&gt;Includes DNS propagation and crash recovery&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Depends on:&lt;/strong&gt; Crash recovery time, DNS propagation, TTL settings&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Multi-AZ with two standbys:&lt;/strong&gt; Under 35 seconds, 2x faster transaction commit latency&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  RDS Proxy Benefits
&lt;/h4&gt;

&lt;p&gt;Both services support &lt;strong&gt;Amazon RDS Proxy&lt;/strong&gt; for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Connection pooling and sharing&lt;/li&gt;
&lt;li&gt;Faster failover recovery&lt;/li&gt;
&lt;li&gt;Automatic connection to new primary&lt;/li&gt;
&lt;li&gt;Maintained idle connections during failover&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  High Availability and Disaster Recovery
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Aurora PostgreSQL HA/DR
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Storage-compute separation&lt;/strong&gt; architecture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Six storage nodes&lt;/strong&gt; across multiple Availability Zones&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All readers accessible&lt;/strong&gt; via instance or reader endpoints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal replica lag&lt;/strong&gt; (typically &amp;lt;100ms)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aurora Global Database&lt;/strong&gt; for cross-Region replication (&amp;lt;1 second latency)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic DB snapshot sharing&lt;/strong&gt; across accounts and regions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Backup integration&lt;/strong&gt; for cross-region backup sharing&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  RDS for PostgreSQL HA/DR
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-AZ deployment options:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;One standby (synchronous replication)&lt;/li&gt;
&lt;li&gt;Two readable standbys (semi-synchronous)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Local storage&lt;/strong&gt; for transaction logs (WAL logs)&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Write-then-flush pattern&lt;/strong&gt; for reduced failover time and faster commits&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Automated backups&lt;/strong&gt; from standby in classic Multi-AZ&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Cross-Region and same-Region replicas&lt;/strong&gt; (transaction log-based)&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Snapshot sharing&lt;/strong&gt; across accounts and regions&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Additional Aurora Features
&lt;/h2&gt;

&lt;p&gt;Aurora PostgreSQL provides several value-add features:&lt;/p&gt;

&lt;h3&gt;
  
  
  Fast Database Cloning
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quick cloning&lt;/strong&gt; of all databases in DB cluster&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Faster than RDS snapshot restore&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ideal for:&lt;/strong&gt; Testing schema/parameter changes, analytics on near-production data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Query Plan Management (QPM)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Control query plan changes&lt;/strong&gt; to avoid performance degradation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintain optimal plans&lt;/strong&gt; despite table/index statistics changes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cluster Cache Management
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Warm cache synchronization&lt;/strong&gt; between designated replica and writer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Immediate cache availability&lt;/strong&gt; after failover&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sustained performance&lt;/strong&gt; post-failover&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Aurora Serverless v2
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;On-demand autoscaling&lt;/strong&gt; configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full Aurora feature set&lt;/strong&gt; (cloning, global database, Multi-AZ, multiple readers)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic scaling:&lt;/strong&gt; Starts up, shuts down, scales capacity based on needs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instant scaling:&lt;/strong&gt; Hundreds to thousands of transactions in seconds&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fine-grained capacity adjustments&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Aurora Machine Learning
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ML-based predictions&lt;/strong&gt; via SQL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrated with:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Amazon SageMaker&lt;/li&gt;
&lt;li&gt;Amazon Bedrock (generative AI)&lt;/li&gt;
&lt;li&gt;Amazon Comprehend (sentiment analysis)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;No custom integrations&lt;/strong&gt; or data movement required&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This comprehensive analysis has explored the architectural details and feature sets of Amazon Aurora PostgreSQL-Compatible Edition and Amazon RDS for PostgreSQL. Key takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Aurora PostgreSQL&lt;/strong&gt; excels in massive throughput, highly concurrent workloads, and provides enterprise database capabilities with minimal operational overhead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RDS for PostgreSQL&lt;/strong&gt; offers flexibility with storage types, extensive read replica options, and cost-effective solutions for standard workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both services provide robust solutions for managed PostgreSQL deployments on AWS, each with distinct strengths suited to different use cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;p&gt;For further guidance on migrating to Aurora PostgreSQL or RDS for PostgreSQL:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/migrate-an-on-premises-postgresql-database-to-amazon-rds-for-postgresql.html" rel="noopener noreferrer"&gt;Migrate an on-premises PostgreSQL database to Amazon RDS for PostgreSQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Migrating.html" rel="noopener noreferrer"&gt;Migrating data to Amazon Aurora with PostgreSQL compatibility&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Have questions or suggestions?&lt;/strong&gt; Please leave a comment below.&lt;/p&gt;

</description>
      <category>rds</category>
      <category>aurora</category>
      <category>aws</category>
    </item>
    <item>
      <title>Bedrock AgentCore: What 5 Real ANZ Enterprise Deploys Taught Us</title>
      <dc:creator>Raj Murugan</dc:creator>
      <pubDate>Sun, 30 Nov 2025 12:27:44 +0000</pubDate>
      <link>https://forem.com/rajmurugan/bedrock-agentcore-what-5-real-anz-enterprise-deploys-taught-us-1e28</link>
      <guid>https://forem.com/rajmurugan/bedrock-agentcore-what-5-real-anz-enterprise-deploys-taught-us-1e28</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;I've spent the last 9 months shipping Bedrock AgentCore into four different ANZ enterprises (plus one internal PoC that crashed and burned).&lt;br&gt;&lt;br&gt;
This isn't a hello-world tutorial – it's the bruises, the invoices, and the 3 a.m. CloudWatch alarms that finally made the thing stick.&lt;br&gt;&lt;br&gt;
If you're about to promote an agent past the "demo for the board" stage, steal this checklist – it will save you at least one rollback.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  the numbers we actually saw
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Use-case&lt;/th&gt;
&lt;th&gt;10 k q/mo cost&lt;/th&gt;
&lt;th&gt;p95 latency&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single agent&lt;/td&gt;
&lt;td&gt;Simple Q&amp;amp;A&lt;/td&gt;
&lt;td&gt;~AUD 180&lt;/td&gt;
&lt;td&gt;2.1 s&lt;/td&gt;
&lt;td&gt;Hallucinated once traffic &amp;gt; 2 k/day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supervisor + 3 subs&lt;/td&gt;
&lt;td&gt;HR triage&lt;/td&gt;
&lt;td&gt;~AUD 420&lt;/td&gt;
&lt;td&gt;4.3 s&lt;/td&gt;
&lt;td&gt;60 % less duplicate Lambda code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentCore Runtime&lt;/td&gt;
&lt;td&gt;SRE co-pilot&lt;/td&gt;
&lt;td&gt;~AUD 620&lt;/td&gt;
&lt;td&gt;3.8 s&lt;/td&gt;
&lt;td&gt;GitOps deploy, full traces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guardrail-wrapped&lt;/td&gt;
&lt;td&gt;Student chat&lt;/td&gt;
&lt;td&gt;~AUD 520&lt;/td&gt;
&lt;td&gt;4.9 s&lt;/td&gt;
&lt;td&gt;PII blocked, compliance happy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Supervisor pattern is the only one that survived a production spike without a hot-fix.&lt;br&gt;&lt;br&gt;
Single agents are great for a sprint demo – and terrible for anything that hits the internet.&lt;/p&gt;


&lt;h2&gt;
  
  
  Managed agents vs. AgentCore Runtime – pick one before 10 k users
&lt;/h2&gt;

&lt;p&gt;I drew this on a whiteboard for our CFO after she saw the second invoice:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3jsxqsrwfvrpn199fqvz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3jsxqsrwfvrpn199fqvz.png" alt=" " width="800" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Rule we now write into every SoW:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;PoC = managed. Day-1 prod = Runtime.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The moment you need a custom MCP tool or a side-car Lambda, the console becomes a drag.&lt;/p&gt;


&lt;h2&gt;
  
  
  Ground-truth data – skip it and you'll ship a liar
&lt;/h2&gt;

&lt;p&gt;Our first Kindo chatbot went live with 37 manually-written examples.&lt;br&gt;&lt;br&gt;
Two weeks later a student asked "What grade do I need to pass?" and the agent calmly invented a 42 % cutoff (it's 50 %).&lt;br&gt;&lt;br&gt;
Cue 4 a.m. rollback.&lt;/p&gt;

&lt;p&gt;We fixed it the boring way:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Exported 18 k real (de-identified) chat logs.
&lt;/li&gt;
&lt;li&gt;LLM-expanded edge cases: "give me 50 ways to ask about vacation pay".
&lt;/li&gt;
&lt;li&gt;Human reviewed, 1 200 kept.
&lt;/li&gt;
&lt;li&gt;Added sessionAttributes (studentID, semester) so the agent could look up live data.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Accuracy jumped from 67 % → 92 % and the support ticket queue dropped by half.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# pytest harness we run in CI
&lt;/span&gt;&lt;span class="n"&gt;tests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ground_truth.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;sessionAttributes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attrs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Supervisor pattern that actually compiles
&lt;/h2&gt;

&lt;p&gt;Payroll bot rewrite: one supervisor + three specialised subs (policy, leave-balances, tickets).&lt;br&gt;&lt;br&gt;
60 % less copy-paste Lambda code, and we could unit-test each sub in isolation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agentcore&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;

&lt;span class="n"&gt;supervisor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-3-5-sonnet-20240620-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a router. Never answer directly – always delegate to the correct sub-agent.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;supervisor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gateway MCP let us plug ServiceNow REST APIs without re-writing the OpenAPI schema – biggest time-saver of the sprint.&lt;/p&gt;




&lt;h2&gt;
  
  
  Guardrails – the checkbox that saved our audit
&lt;/h2&gt;

&lt;p&gt;First deploy forgot guardrails.&lt;br&gt;&lt;br&gt;
Next day a student pasted their email + TFN into the chat and the agent happily parroted it back in the response.&lt;br&gt;&lt;br&gt;
Security team put a red sticker on my laptop.&lt;/p&gt;

&lt;p&gt;Now we enforce org-level guardrails before any agent alias hits prod:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Filter&lt;/th&gt;
&lt;th&gt;Block %&lt;/th&gt;
&lt;th&gt;Mask %&lt;/th&gt;
&lt;th&gt;AUD / mo&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PII (email, TFN)&lt;/td&gt;
&lt;td&gt;2.1&lt;/td&gt;
&lt;td&gt;8.4&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom finance terms&lt;/td&gt;
&lt;td&gt;1.7&lt;/td&gt;
&lt;td&gt;3.2&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hate/violence&lt;/td&gt;
&lt;td&gt;0.3&lt;/td&gt;
&lt;td&gt;–&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4.1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11.6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;66&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cheap insurance.&lt;/p&gt;




&lt;h2&gt;
  
  
  IaC + observability – or you'll debug in the console at 2 a.m.
&lt;/h2&gt;

&lt;p&gt;We template everything in CDK (Python). One &lt;code&gt;cdk deploy&lt;/code&gt; spins up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AgentCore Runtime container&lt;/li&gt;
&lt;li&gt;Lambda layers for Powertools &amp;amp; boto3 latest&lt;/li&gt;
&lt;li&gt;X-Ray traces, CloudWatch dashboards, alarms&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Target&lt;/th&gt;
&lt;th&gt;Alarm&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Task success&lt;/td&gt;
&lt;td&gt;≥ 95 %&lt;/td&gt;
&lt;td&gt;&amp;lt; 90 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p95 latency&lt;/td&gt;
&lt;td&gt;≤ 5 s&lt;/td&gt;
&lt;td&gt;&amp;gt; 10 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token spend&lt;/td&gt;
&lt;td&gt;≤ AUD 70/day&lt;/td&gt;
&lt;td&gt;&amp;gt; AUD 140&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PII leak count&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;&amp;gt; 0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Routing loops show up as 30 s p99 spikes – impossible to spot without traces.&lt;/p&gt;




&lt;h2&gt;
  
  
  10-line deploy checklist we paste into every PR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] 200+ ground-truth conversations in &lt;code&gt;tests/ground_truth.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Supervisor agent uses Sonnet; subs pinned to Haiku for cost&lt;/li&gt;
&lt;li&gt;[ ] Guardrails alias attached (BLOCK PII, MASK custom)&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;agentcore deploy --stage prod --approve&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Powertools tracer + metrics on every handler&lt;/li&gt;
&lt;li&gt;[ ] CloudWatch alarm for "PII leak &amp;gt; 0" – page the on-call&lt;/li&gt;
&lt;li&gt;[ ] A/B toggle for Haiku fallback if token spend &amp;gt; budget&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Starter repo we fork every time:&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/awslabs/amazon-bedrock-agentcore-samples" rel="noopener noreferrer"&gt;https://github.com/awslabs/amazon-bedrock-agentcore-samples&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;80 % of the boilerplate is done in 90 min – the rest is ground-truth grunt work.&lt;/p&gt;




&lt;p&gt;If you're riding the agent hype-wave right now, remember: &lt;strong&gt;the demo is the easy 10 %&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;These notes are for the other 90 % – the invoices, the guardrails, the 3 a.m. pages.&lt;/p&gt;

&lt;p&gt;Steal what you need, add your own scars, and ship something that won't hallucinate when the CFO asks it a question.&lt;/p&gt;

&lt;p&gt;Happy building, and may your p95 always be under 5 s.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>bedrock</category>
      <category>agentcore</category>
    </item>
    <item>
      <title>From Windows/Corona to Linux V-Ray Standalone on AWS Deadline Cloud – Architecture That Actually Worked</title>
      <dc:creator>Raj Murugan</dc:creator>
      <pubDate>Fri, 31 Oct 2025 12:48:30 +0000</pubDate>
      <link>https://forem.com/rajmurugan/from-windowscorona-to-linux-v-ray-standalone-on-aws-deadline-cloud-architecture-that-actually-1kcb</link>
      <guid>https://forem.com/rajmurugan/from-windowscorona-to-linux-v-ray-standalone-on-aws-deadline-cloud-architecture-that-actually-1kcb</guid>
      <description>&lt;p&gt;Over the last few weeks I moved a real production scene from Windows/Corona to Linux V-Ray Standalone on AWS Deadline Cloud. This isn't a hello-world write-up—it's the practical path that got 400 frames moving reliably, with the guardrails that kept things from falling over.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why V-Ray Standalone on Linux
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost and scale&lt;/strong&gt;: spot capacity on Linux is abundant and significantly cheaper; per-frame costs dropped materially.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stability&lt;/strong&gt;: lean workers, faster boot, fewer moving parts than full DCC stacks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portability&lt;/strong&gt;: a well-formed .vrscene plus clean paths travels across environments.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture at a glance
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Submit&lt;/strong&gt;: 3ds Max exports a .vrscene and a tiny JSON sidecar with frame range and metadata.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Queue&lt;/strong&gt;: the render manager expands the frame list (e.g., 0-399x1) into 1 task per frame.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workers&lt;/strong&gt;: Linux images/containers with V-Ray Standalone and a small pre-task that rewrites Windows/UNC paths to Linux mounts and validates every reference.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage options&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Job Attachments&lt;/em&gt;: upload scene+assets once; content-hash dedupe; great for portability.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Sync/Mounts&lt;/em&gt;: Resilio/NAS to EFS/FSx mounts; great for giant libraries and rapid iteration.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Output&lt;/strong&gt;: frames land in S3 (or mounted storage) and mirror back on-prem if needed.&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I shipped first (MVP)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;One .vrscene&lt;/li&gt;
&lt;li&gt;One asset bundle (attachments) or a mounted project share&lt;/li&gt;
&lt;li&gt;One pre-task: path mapping + sanity checks&lt;/li&gt;
&lt;li&gt;One queue with a clean frame list&lt;/li&gt;
&lt;li&gt;Checkpointing enabled (tunable interval) to survive interruptions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Path mapping that kept me sane
&lt;/h2&gt;

&lt;p&gt;I avoid "parse everything" and instead declare path intents. A simple JSON map drives a conservative rewrite; anything not matching known roots is left untouched, then validated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example path map (sidecar)
&lt;/h3&gt;

&lt;p&gt;{&lt;br&gt;
"mappings": [&lt;br&gt;
{ "win": "C:\Projects\", "linux": "/mnt/projects/" },&lt;br&gt;
{ "win": "\nas\assets\", "linux": "/mnt/assets/" }&lt;br&gt;
],&lt;br&gt;
"fps": 25,&lt;br&gt;
"start": 0,&lt;br&gt;
"end": 399,&lt;br&gt;
"step": 1&lt;br&gt;
}&lt;/p&gt;

&lt;h3&gt;
  
  
  Pre-task outline (Python)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Read scene.meta.json and path map.&lt;/li&gt;
&lt;li&gt;Scan the .vrscene for Windows/UNC paths, rewrite to Linux.&lt;/li&gt;
&lt;li&gt;Verify existence; if any missing, print a short remediation report and exit non-zero.&lt;/li&gt;
&lt;li&gt;Hand the cleaned .vrscene to V-Ray.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Attachments vs Sync (when I pick which)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Attachments
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pros&lt;/strong&gt;: portable, deduped, reproducible; perfect for contained shots (&amp;lt;20-30 GB).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons&lt;/strong&gt;: pay upload cost; less ideal for very frequent micro-edits.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Sync/Mounts
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pros&lt;/strong&gt;: great for giant shared libraries, instant edits; familiar artist workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons&lt;/strong&gt;: cold caches and path drift can bite; reproducibility depends on discipline.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Rule of thumb now
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Shot-specific data → Attachments&lt;/li&gt;
&lt;li&gt;Global/shared libraries → Mounts&lt;/li&gt;
&lt;li&gt;Hybrid is fine&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Licensing notes that saved time
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Bring-your-own licenses work well if the server is reachable with low latency—preflight a license ping and fail fast if checkout trips.&lt;/li&gt;
&lt;li&gt;Usage-based licensing is a clean burst option when seats run tight.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Checkpointing/resume (aka sticky rendering)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Keep it on, but measure the overhead. I start at 10 minutes; 5-15 minutes is the practical band depending on frame length and disk I/O.&lt;/li&gt;
&lt;li&gt;Store checkpoints on local NVMe; avoid remote writes in the hot loop.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I made ~400 frames feel easy
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;1 task per frame → clean retries and metrics.&lt;/li&gt;
&lt;li&gt;Target workers ≈ frame count, respecting license ceilings.&lt;/li&gt;
&lt;li&gt;Guardrails: budget tags, per-queue caps, idle scale-in after the tail finishes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Observability that actually mattered
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Per-frame&lt;/strong&gt;: queue wait, time-to-first-pixel, render time, upload time, cost per frame.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-fleet&lt;/strong&gt;: desired vs healthy, interruptions, cache hit rates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logs I read first&lt;/strong&gt;: pre-task "missing assets" summary, V-Ray headers/footers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Things that failed (and fixes)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Mixed slashes/whitespace in texture paths → normalize separators and quote paths.&lt;/li&gt;
&lt;li&gt;"Works on my machine" exports → validation that lists all external refs by type before export.&lt;/li&gt;
&lt;li&gt;UNC vs drive letter roots → always include both in mapping.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Submission checklist (copy-paste)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Render Setup has the correct frame list (or the submitter enforces it).&lt;/li&gt;
&lt;li&gt;Export .vrscene + scene.meta.json + pathmap.json.&lt;/li&gt;
&lt;li&gt;Pick one storage mode per shot (Attachments vs Sync) and stick to it.&lt;/li&gt;
&lt;li&gt;Select the Linux V-Ray queue; set frames (e.g., 0-399x1).&lt;/li&gt;
&lt;li&gt;Enable checkpoints; pick interval.&lt;/li&gt;
&lt;li&gt;Tag the job (project, shot, budget).&lt;/li&gt;
&lt;li&gt;Submit; only open logs when a task fails—start with the pre-task.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  If you're starting today
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Ship the pre-task and one shot with attachments first; it removes 80% of unknowns.&lt;/li&gt;
&lt;li&gt;Add sync mounts later if iteration speed demands it.&lt;/li&gt;
&lt;li&gt;Keep exporters simple and submitters smart: the job owns the frame list; the .vrscene is the payload.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'd love to hear
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Your path-mapping rules for mixed Windows/UNC environments.&lt;/li&gt;
&lt;li&gt;Checkpoint intervals that worked best for long frames.&lt;/li&gt;
&lt;li&gt;Any gotchas with VRMesh/proxy paths across platforms.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If there's interest, I'll post the pre-task template, a scene.meta.json schema, and a ready-to-use dashboard for frame-time and cost.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>aws</category>
      <category>linux</category>
    </item>
    <item>
      <title>Which scaling tool should be used for Kubernetes clusters on AWS?</title>
      <dc:creator>Raj Murugan</dc:creator>
      <pubDate>Tue, 30 Sep 2025 13:53:01 +0000</pubDate>
      <link>https://forem.com/rajmurugan/which-scaling-tool-should-be-used-for-kubernetes-clusters-on-aws-f11</link>
      <guid>https://forem.com/rajmurugan/which-scaling-tool-should-be-used-for-kubernetes-clusters-on-aws-f11</guid>
      <description>&lt;p&gt;Kubernetes autoscaling on AWS has evolved dramatically over the past few years. What started with the traditional Cluster Autoscaler has now expanded to include innovative solutions like Karpenter and the newest addition, EKS Auto Mode. As AWS solution architects, we're constantly evaluating which approach best fits our workloads and operational requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Scaling Solutions Explained
&lt;/h2&gt;

&lt;h3&gt;
  
  
  EKS Auto Mode: The Fully Managed Approach
&lt;/h3&gt;

&lt;p&gt;EKS Auto Mode represents AWS's vision of "Kubernetes without the operational overhead." Launched in 2024, it's designed for teams who want production-ready clusters with minimal configuration and maximum automation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fully managed infrastructure&lt;/strong&gt;: AWS handles everything from node provisioning to security patching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in Karpenter&lt;/strong&gt;: Leverages AWS-managed Karpenter for intelligent scaling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced security&lt;/strong&gt;: Immutable AMIs with SELinux and read-only root filesystems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;21-day node lifecycle&lt;/strong&gt;: Automatic node replacement for security and compliance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-click setup&lt;/strong&gt;: Production-grade defaults out of the box&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key architectural difference is that EKS Auto Mode uses a Karpenter-based system that automatically provisions EC2 instances in response to pod requests, but with AWS managing the entire Karpenter lifecycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  Karpenter: The Intelligent Optimizer
&lt;/h3&gt;

&lt;p&gt;Karpenter revolutionized Kubernetes node provisioning by eliminating the need for predefined node groups and providing just-in-time capacity provisioning. Developed by AWS as an open-source project, it addresses the limitations of traditional autoscaling approaches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Capabilities:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Direct EC2 integration&lt;/strong&gt;: Bypasses Auto Scaling Groups for faster provisioning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workload-aware scaling&lt;/strong&gt;: Analyzes pending pods to select optimal instance types&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced consolidation&lt;/strong&gt;: Automatically right-sizes and bin-packs workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spot instance intelligence&lt;/strong&gt;: First-class support for cost-effective Spot instances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-source flexibility&lt;/strong&gt;: Community-driven with extensive customization options&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unlike Cluster Autoscaler which works with Auto Scaling Groups, Karpenter works directly with EC2 instances. It employs application-aware scheduling, considering pod-specific requirements like taints, tolerations, and node affinity for additional customization and flexibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cluster Autoscaler: The Battle-Tested Veteran
&lt;/h3&gt;

&lt;p&gt;The Cluster Autoscaler remains the most widely deployed autoscaling solution, offering broad compatibility and proven reliability across multiple cloud providers. It scales based on pending pods and uses predefined scaling policies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Node group based&lt;/strong&gt;: Works with predefined Auto Scaling Groups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-cloud support&lt;/strong&gt;: Compatible with AWS, GCP, Azure, and other providers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mature ecosystem&lt;/strong&gt;: Extensive documentation and community support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conservative scaling&lt;/strong&gt;: Predictable behavior with established patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple integration&lt;/strong&gt;: Works with existing ASG configurations&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  At a Glance: Key Differentiators
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8z5xcms9yzepz2fvcstu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8z5xcms9yzepz2fvcstu.png" alt="Autoscaling solutions Comparision" width="800" height="778"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Comparison
&lt;/h2&gt;




&lt;h2&gt;
  
  
  Performance and Cost Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scaling Performance
&lt;/h3&gt;

&lt;p&gt;The scaling speed varies significantly between solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;EKS Auto Mode &amp;amp; Karpenter&lt;/strong&gt;: Both provision new nodes in under 60 seconds due to direct EC2 API integration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cluster Autoscaler&lt;/strong&gt;: Typically takes 2-5 minutes due to ASG policy execution and health check delays.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost Optimization
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;EKS Auto Mode&lt;/strong&gt; provides optimal cost optimization through automatic capacity planning and dynamic scaling. It removes underutilized nodes, replaces expensive nodes with cheaper alternatives, and consolidates workloads onto more efficient compute resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Karpenter&lt;/strong&gt; offers excellent cost optimization by launching right-sized nodes and culling unused resources. It can automatically provision a mix of On-Demand and Spot instances, dynamically choosing the most cost-effective options.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cluster Autoscaler&lt;/strong&gt; provides basic cost optimization by scaling down underutilized nodes. However, it doesn't directly manage Spot instances or automatically optimize costs based on instance type selection.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Decision Framework
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi5on3ogljrbn0rj4bnot.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi5on3ogljrbn0rj4bnot.png" alt="Decision Flowchart" width="784" height="2100"&gt;&lt;/a&gt;&lt;br&gt;
Decision Flowchart: Choosing the Right AWS Kubernetes Scaling Solution&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Choose Each Solution
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Choose EKS Auto Mode If:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You want minimal operational overhead&lt;/li&gt;
&lt;li&gt;Enhanced security and compliance are priorities&lt;/li&gt;
&lt;li&gt;You're starting a new Kubernetes journey&lt;/li&gt;
&lt;li&gt;AWS-managed infrastructure aligns with your strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;EKS Auto Mode is ideal for production-ready clusters with minimal operational overhead. It's particularly effective for teams that want to focus on application logic rather than cluster management.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose Karpenter If:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Cost optimization is your primary concern&lt;/li&gt;
&lt;li&gt;You need advanced customization capabilities&lt;/li&gt;
&lt;li&gt;Your team has Kubernetes expertise&lt;/li&gt;
&lt;li&gt;Workloads have diverse resource requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Karpenter seems ideal for situations where you want to manage larger heterogeneous clusters that have high amounts of workload churn.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose Cluster Autoscaler If:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You have existing Auto Scaling Group infrastructure&lt;/li&gt;
&lt;li&gt;Multi-cloud consistency is important&lt;/li&gt;
&lt;li&gt;Conservative, predictable scaling is preferred&lt;/li&gt;
&lt;li&gt;Team prefers battle-tested solutions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cluster Autoscaler is perfect if you have traditional workloads and don't need rapid scaling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Migration Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  From Cluster Autoscaler to Auto Mode
&lt;/h3&gt;

&lt;p&gt;When migrating to EKS Auto Mode from Cluster Autoscaler, uninstall any components now managed by Auto Mode, like Karpenter or the AWS Load Balancer Controller. Also ensure your installed add-ons are up-to-date.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Cloud Strategy
&lt;/h3&gt;

&lt;p&gt;For organizations requiring multi-cloud deployment, Cluster Autoscaler provides the broadest compatibility. Karpenter currently works with AWS, Azure, and Alibaba Cloud, while EKS Auto Mode is AWS-specific.&lt;/p&gt;




&lt;h2&gt;
  
  
  Security and Compliance
&lt;/h2&gt;

&lt;p&gt;EKS Auto Mode provides enhanced security features including immutable AMIs, SELinux mandatory access controls, and read-only root file systems. Nodes have a maximum lifetime of 21 days, after which they are automatically replaced with new nodes.&lt;/p&gt;

&lt;p&gt;Both Karpenter and Cluster Autoscaler rely on standard Kubernetes security models, with customers responsible for managing security configurations and patching.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The evolution from Cluster Autoscaler to Karpenter to EKS Auto Mode represents AWS's journey toward simplified, intelligent, and fully managed Kubernetes operations. Each solution serves different organizational needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;EKS Auto Mode&lt;/strong&gt; offers the simplest operational model with AWS handling all infrastructure management.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Karpenter&lt;/strong&gt; provides the best balance of cost optimization and flexibility for teams comfortable with open-source management.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cluster Autoscaler&lt;/strong&gt; remains the most mature and broadly compatible option for traditional deployment patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As Kubernetes continues to mature, the trend clearly moves toward managed services that reduce operational complexity while maintaining the flexibility that makes Kubernetes powerful. EKS Auto Mode represents this future, but the choice ultimately depends on your specific requirements, team capabilities, and organizational constraints.&lt;/p&gt;

&lt;p&gt;Whether you choose the fully managed simplicity of EKS Auto Mode, the intelligent optimization of Karpenter, or the battle-tested reliability of Cluster Autoscaler, the key is understanding how each solution aligns with your operational goals and scaling requirements.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your experience with these autoscaling solutions? Share your insights and questions in the comments below!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Tags: #aws #kubernetes #eks #autoscaling #karpenter #devops #cloudnative&lt;/em&gt;&lt;/p&gt;

</description>
      <category>eks</category>
      <category>karpenter</category>
      <category>aws</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>AWS DMS Fails After Azure SQL Managed Instance Failover? Here’s How to Fix It</title>
      <dc:creator>Raj Murugan</dc:creator>
      <pubDate>Thu, 13 Feb 2025 16:35:53 +0000</pubDate>
      <link>https://forem.com/rajmurugan/aws-dms-fails-after-azure-sql-managed-instance-failover-heres-how-to-fix-it-119k</link>
      <guid>https://forem.com/rajmurugan/aws-dms-fails-after-azure-sql-managed-instance-failover-heres-how-to-fix-it-119k</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;Problem Statement&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A customer is using &lt;strong&gt;AWS Database Migration Service (DMS)&lt;/strong&gt; to migrate data from &lt;strong&gt;Azure SQL Managed Instance&lt;/strong&gt; to &lt;strong&gt;AWS RDS PostgreSQL&lt;/strong&gt; over a &lt;strong&gt;VPN&lt;/strong&gt; in &lt;strong&gt;Full Load + Change Data Capture (CDC)&lt;/strong&gt; mode. The migration runs smoothly until &lt;strong&gt;Azure performs maintenance&lt;/strong&gt;, triggering a failover. When this happens, DMS fails to locate the &lt;strong&gt;last tracked LSN (Log Sequence Number)&lt;/strong&gt;, causing a &lt;strong&gt;fatal error&lt;/strong&gt; that stops the task.&lt;br&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg0k6cy05nzcmwuph9oj6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg0k6cy05nzcmwuph9oj6.png" alt="Error" width="800" height="381"&gt;&lt;/a&gt;&lt;br&gt;
Currently, the only workaround is to &lt;strong&gt;manually restart&lt;/strong&gt; the DMS task and perform a &lt;strong&gt;full load&lt;/strong&gt; again, which is inefficient and time-consuming.  &lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Understanding the Issue&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When an &lt;strong&gt;Azure SQL Managed Instance failover&lt;/strong&gt; occurs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;old primary&lt;/strong&gt; is demoted, and a &lt;strong&gt;new primary&lt;/strong&gt; takes over.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;transaction logs on the new primary&lt;/strong&gt; may not have the same &lt;strong&gt;LSN sequence&lt;/strong&gt; as the old primary.&lt;/li&gt;
&lt;li&gt;AWS DMS attempts to &lt;strong&gt;resume CDC from the last tracked LSN&lt;/strong&gt;, but if that LSN &lt;strong&gt;doesn’t exist on the new primary&lt;/strong&gt;, DMS throws an &lt;strong&gt;error and stops the task&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;This issue arises because &lt;strong&gt;SQL Server Availability Groups do not synchronize backup history across replicas&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Potential Solutions &amp;amp; Trade-offs&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Since this scenario is &lt;strong&gt;easy to replicate&lt;/strong&gt;, testing these solutions in a &lt;strong&gt;non-production environment&lt;/strong&gt; first is recommended.  &lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;1️⃣ Temporary Fix – Manual Action Required on Every Failover&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Find the latest LSN on the new primary:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fn_cdc_get_max_lsn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;CurrentMaxLSN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Restart DMS using the &lt;strong&gt;new LSN&lt;/strong&gt; via &lt;strong&gt;AWS CLI&lt;/strong&gt; or &lt;strong&gt;console&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trade-offs:&lt;/strong&gt;&lt;br&gt;
✅ Simple and requires no architectural changes.&lt;br&gt;&lt;br&gt;
❌ Requires manual intervention every time a failover occurs.&lt;br&gt;&lt;br&gt;
❌ Increases downtime and operational overhead.  &lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;2️⃣ Permanent Fix – Adjust DMS Settings (May Not Work for Azure SQL MI)&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Set:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"AlwaysOnSharedSynchedBackupIsEnabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;This allows DMS to &lt;strong&gt;poll all nodes in the Always On cluster&lt;/strong&gt; for transaction backups.&lt;/li&gt;
&lt;li&gt;Works for &lt;strong&gt;on-prem SQL Server&lt;/strong&gt;, but may &lt;strong&gt;prevent reading from the old primary&lt;/strong&gt; in &lt;strong&gt;Azure SQL MI&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trade-offs:&lt;/strong&gt;&lt;br&gt;
✅ Fully automated once configured.&lt;br&gt;&lt;br&gt;
✅ No need for manual intervention.&lt;br&gt;&lt;br&gt;
❌ May not be supported in &lt;strong&gt;Azure SQL MI&lt;/strong&gt;, requiring additional testing.&lt;br&gt;&lt;br&gt;
❌ Potential data consistency risks if backups are not fully synchronized.  &lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3️⃣ Permanent Fix – Use Transaction Log Backups Instead of Live Logs&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Azure SQL Managed Instance&lt;/strong&gt; creates &lt;strong&gt;transaction log backups&lt;/strong&gt; every ~10 minutes.&lt;/li&gt;
&lt;li&gt;These backups are &lt;strong&gt;consistent across failovers&lt;/strong&gt;, avoiding &lt;strong&gt;LSN loss&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DMS v3.5.3+&lt;/strong&gt; supports reading from log backups for &lt;strong&gt;Amazon RDS for SQL Server&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grant necessary permissions&lt;/strong&gt; for DMS to read log backups:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;msdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rds_dms_tlog_download&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;rds_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;msdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rds_dms_tlog_read&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;rds_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;msdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rds_dms_tlog_list_current_lsn&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;rds_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;msdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rds_task_status&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;rds_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;More details: &lt;a href="https://learn.microsoft.com/en-us/azure/azure-sql/managed-instance/automated-backups-overview?view=azuresql" rel="noopener noreferrer"&gt;AWS DMS SQL Server Permissions&lt;/a&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📌 &lt;em&gt;Supported for Amazon RDS for SQL Server from DMS v3.5.3+ but may not work for Azure SQL MI—needs testing.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-offs:&lt;/strong&gt;&lt;br&gt;
✅ Provides a seamless failover experience without breaking CDC.&lt;br&gt;&lt;br&gt;
✅ No manual intervention required after failovers.&lt;br&gt;&lt;br&gt;
❌ Might not be supported in Azure SQL MI, requiring thorough testing.&lt;br&gt;&lt;br&gt;
❌ Backup frequency (~10 min) could introduce minor delays in CDC.  &lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Alternative SQL Managed Instance Solution&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;4️⃣ Synchronize Backup History Across Replicas&lt;/strong&gt; (If Solution #2 Fails)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;If the &lt;strong&gt;secondary replica&lt;/strong&gt; is &lt;strong&gt;not configured for read access&lt;/strong&gt;, set:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"AlwaysOnSharedSynchedBackupIsEnabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;This forces DMS to use the &lt;strong&gt;primary replica&lt;/strong&gt; as a standalone SQL Server.&lt;/li&gt;
&lt;li&gt;However, SQL Server does &lt;strong&gt;not automatically synchronize backup metadata across replicas&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Workaround: Run a &lt;strong&gt;script on the replica where backups occur&lt;/strong&gt; to register backup history on &lt;strong&gt;other replicas&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Example script &amp;amp; details: &lt;a href="https://community.qlik.com/t5/Official-Support-Articles/Qlik-Replicate-and-SQL-Server-AG-HA-Handling/ta-p/2156786" rel="noopener noreferrer"&gt;Qlik Replicate &amp;amp; SQL Server AG HA Handling&lt;/a&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trade-offs:&lt;/strong&gt;&lt;br&gt;
✅ Ensures backup history is consistent across replicas.&lt;br&gt;&lt;br&gt;
✅ Works well with Always On Availability Groups.&lt;br&gt;&lt;br&gt;
❌ Requires additional scripting and maintenance overhead.&lt;br&gt;&lt;br&gt;
❌ May not work in all configurations, depending on permissions and replication settings.  &lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Failovers in &lt;strong&gt;Azure SQL Managed Instance&lt;/strong&gt; disrupt AWS DMS &lt;strong&gt;CDC&lt;/strong&gt; due to LSN mismatches. Using &lt;strong&gt;transaction log backups&lt;/strong&gt; or &lt;strong&gt;modifying DMS settings&lt;/strong&gt; can help mitigate the issue. Testing these solutions in a &lt;strong&gt;staging environment&lt;/strong&gt; before deploying to production is essential.&lt;/p&gt;

&lt;p&gt;Let me know if you’ve encountered similar issues or tested any of these solutions! 🚀&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;References&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://learn.microsoft.com/en-us/azure/azure-sql/managed-instance/automated-backups-overview?view=azuresql" rel="noopener noreferrer"&gt;AWS DMS SQL Server Permissions&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://community.qlik.com/t5/Official-Support-Articles/Qlik-Replicate-and-SQL-Server-AG-HA-Handling/ta-p/2156786" rel="noopener noreferrer"&gt;Qlik Replicate &amp;amp; SQL Server AG HA Handling&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>sqlserver</category>
      <category>dms</category>
    </item>
  </channel>
</rss>
