<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Darryl Ruggles</title>
    <description>The latest articles on Forem by Darryl Ruggles (@rdarrylr).</description>
    <link>https://forem.com/rdarrylr</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F18399%2F60dbb262-11c7-447e-aefd-420e4188f443.jpg</url>
      <title>Forem: Darryl Ruggles</title>
      <link>https://forem.com/rdarrylr</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rdarrylr"/>
    <language>en</language>
    <item>
      <title>EKS and The Cross-AZ Tax: How to Stop Paying AWS $0.02/GB for Traffic That Should Never Leave Your Availability Zone</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Fri, 03 Apr 2026 14:02:49 +0000</pubDate>
      <link>https://forem.com/aws-builders/eks-and-the-cross-az-tax-how-to-stop-paying-aws-002gb-for-traffic-that-should-never-leave-your-1i45</link>
      <guid>https://forem.com/aws-builders/eks-and-the-cross-az-tax-how-to-stop-paying-aws-002gb-for-traffic-that-should-never-leave-your-1i45</guid>
      <description>&lt;p&gt;I run EKS clusters across three Availability Zones because that's what production resilience typically demands. You probably do too. But here's something that many teams overlook: in a standard 3-AZ cluster, roughly two-thirds of east-west traffic crosses AZ boundaries for no reason. AWS charges $0.01/GB in each direction for that, $0.02/GB round trip, and it adds up fast.&lt;/p&gt;

&lt;p&gt;This isn't a theoretical problem. A cluster pushing 10 TB/month of inter-service traffic with 50-67% crossing AZ boundaries racks up $50-67/month in cross-AZ charges ($0.01/GB per direction). Scale that to 50 TB/month and you're looking at $250-335/month. For a cost that's entirely avoidable with a one-line YAML change on your Services.&lt;/p&gt;

&lt;p&gt;I built a test environment to measure the actual impact of the major cross-AZ optimization strategies that AWS and Kubernetes provide. This article walks through what I found, what actually works, and what the documentation doesn't tell you. All the code is Terraform and Kubernetes 1.35, using EKS Auto Mode.&lt;/p&gt;

&lt;p&gt;If you have not read my earlier article, &lt;a href="https://darryl-ruggles.cloud/a-complete-terraform-setup-for-eks-auto-mode-is-it-right-for-you/" rel="noopener noreferrer"&gt;A Complete Terraform Setup for EKS Auto Mode&lt;/a&gt;, that one covers the compute side of EKS cost optimization (Auto Mode, Graviton NodePools, Spot fallback chains, and the Terraform module structure I use). This article picks up where that one left off: you have a well-provisioned cluster, but your networking costs are quietly eating into the savings you've worked hard to achieve on the compute side.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Money Goes
&lt;/h2&gt;

&lt;p&gt;Before optimizing anything, I wanted to understand exactly where cross-AZ traffic happens. There are three places, and the first one is by far the biggest.&lt;/p&gt;

&lt;h3&gt;
  
  
  East-West Traffic: The Silent Majority
&lt;/h3&gt;

&lt;p&gt;When Pod A calls Pod B through a Service, kube-proxy picks a backend pod using iptables rules. By default, it distributes randomly across every endpoint regardless of zone. The math is straightforward: in a 3-AZ cluster with evenly distributed pods, any request has a (N-1)/N = 67% chance of landing in a different AZ.&lt;/p&gt;

&lt;p&gt;That 67% is the number that matters. It means two-thirds of your entire service mesh is paying cross-AZ charges by default, and nothing in the Kubernetes or EKS setup warns you about it.&lt;/p&gt;

&lt;h3&gt;
  
  
  North-South Traffic: The Extra Hop
&lt;/h3&gt;

&lt;p&gt;The AWS Load Balancer Controller supports "instance" mode and "ip" mode for targeting. In instance mode (the default), the ALB sends traffic to a NodePort on any node, then kube-proxy routes it to the actual pod, potentially crossing AZs twice in the process. In ip mode, the ALB targets pod IPs directly. Switching to IP target mode eliminates the problem entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  NAT Gateway Traffic: Death by a Thousand Pulls
&lt;/h3&gt;

&lt;p&gt;Pods in private subnets reach the internet through NAT Gateways. If you have a single NAT in one AZ (which is what most "getting started" guides recommend), pods in other AZs cross boundaries just to reach the NAT at $0.045/GB processing plus the cross-AZ charge. Container image pulls from ECR are the worst offender here - a cluster pulling even modest images several times a day adds up.&lt;/p&gt;

&lt;p&gt;Here is what a typical unoptimized cluster looks like. The red lines are cross-AZ traffic you are paying for:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgyuon8bso5khnvhctr1q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgyuon8bso5khnvhctr1q.png" alt="Cross-AZ traffic before optimization" width="800" height="820"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Test Setup
&lt;/h2&gt;

&lt;p&gt;I built this with Terraform using the same modular approach from my &lt;a href="https://darryl-ruggles.cloud/a-complete-terraform-setup-for-eks-auto-mode-is-it-right-for-you/" rel="noopener noreferrer"&gt;EKS Auto Mode article&lt;/a&gt;: community modules for VPC and EKS, the &lt;code&gt;alekc/kubectl&lt;/code&gt; provider for Kubernetes resources. I use &lt;code&gt;kubectl&lt;/code&gt; rather than the &lt;code&gt;kubernetes&lt;/code&gt; or &lt;code&gt;helm&lt;/code&gt; providers because it handles server-side apply of CRDs (like Karpenter NodePools) cleanly, which the standard &lt;code&gt;kubernetes&lt;/code&gt; provider struggles with. If you followed that article, the structure here will look familiar. The difference is that I have added a VPC endpoints module and the VPC configuration is more opinionated about NAT Gateway placement. Three modules: VPC, EKS, and VPC endpoints.&lt;/p&gt;

&lt;h3&gt;
  
  
  The VPC: One NAT Per AZ
&lt;/h3&gt;

&lt;p&gt;The first infrastructure decision that matters is NAT Gateway placement. Here is what I used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"vpc"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-aws-modules/vpc/aws"&lt;/span&gt;
  &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 5.0"&lt;/span&gt;

  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.cluster_name}-vpc"&lt;/span&gt;
  &lt;span class="nx"&gt;cidr&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_cidr&lt;/span&gt;
  &lt;span class="nx"&gt;azs&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;azs&lt;/span&gt;

  &lt;span class="nx"&gt;private_subnets&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nx"&gt;cidrsubnet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_cidr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nx"&gt;cidrsubnet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_cidr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nx"&gt;cidrsubnet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_cidr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;public_subnets&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nx"&gt;cidrsubnet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_cidr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nx"&gt;cidrsubnet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_cidr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nx"&gt;cidrsubnet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_cidr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="c1"&gt;# This is the setting that matters&lt;/span&gt;
  &lt;span class="nx"&gt;enable_nat_gateway&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;single_nat_gateway&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="nx"&gt;one_nat_gateway_per_az&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="nx"&gt;enable_dns_hostnames&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;enable_dns_support&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="nx"&gt;public_subnet_tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"kubernetes.io/role/elb"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;private_subnet_tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"kubernetes.io/role/internal-elb"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tags&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each additional NAT Gateway costs about $32/month. You save $0.01/GB in each direction ($0.02/GB round trip) on traffic that no longer crosses AZ boundaries to reach the NAT. That means the extra NAT pays for itself at just 1.6 TB/month of internet-bound traffic per AZ. The NAT processing charge ($0.045/GB) applies regardless of which AZ the NAT is in, so that cost doesn't change with this optimization. For most production clusters, 1.6 TB/month of internet traffic per AZ is a low bar to clear.&lt;/p&gt;

&lt;p&gt;I see many Terraform examples online that set &lt;code&gt;single_nat_gateway = true&lt;/code&gt; to save money in dev environments. That's fine for dev, but if you copy that pattern into production you are silently adding cross-AZ charges to every internet-bound packet from two of your three AZs.&lt;/p&gt;

&lt;h3&gt;
  
  
  VPC Endpoints: The Optimization Most Teams Skip
&lt;/h3&gt;

&lt;p&gt;This is the optimization that surprises most teams. Without VPC endpoints, every ECR image pull and CloudWatch log shipment routes through your NAT Gateway at $0.045/GB. I always add endpoints for the services that generate the most traffic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# S3 Gateway - FREE. No hourly cost, no data cost. No reason not to have this.&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc_endpoint"&lt;/span&gt; &lt;span class="s2"&gt;"s3"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;
  &lt;span class="nx"&gt;service_name&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"com.amazonaws.${data.aws_region.current.name}.s3"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_endpoint_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Gateway"&lt;/span&gt;
  &lt;span class="nx"&gt;route_table_ids&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;route_table_ids&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# ECR API - for docker login and manifest operations&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc_endpoint"&lt;/span&gt; &lt;span class="s2"&gt;"ecr_api"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;
  &lt;span class="nx"&gt;service_name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"com.amazonaws.${data.aws_region.current.name}.ecr.api"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_endpoint_type&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Interface"&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_ids&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnet_ids&lt;/span&gt;
  &lt;span class="nx"&gt;security_group_ids&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_endpoints&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;private_dns_enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# ECR Docker - for image layer pulls (this is the high-volume one)&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc_endpoint"&lt;/span&gt; &lt;span class="s2"&gt;"ecr_dkr"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;
  &lt;span class="nx"&gt;service_name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"com.amazonaws.${data.aws_region.current.name}.ecr.dkr"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_endpoint_type&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Interface"&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_ids&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnet_ids&lt;/span&gt;
  &lt;span class="nx"&gt;security_group_ids&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_endpoints&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;private_dns_enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# STS - for IRSA and Pod Identity. Low volume but breaks auth if missing.&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc_endpoint"&lt;/span&gt; &lt;span class="s2"&gt;"sts"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;
  &lt;span class="nx"&gt;service_name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"com.amazonaws.${data.aws_region.current.name}.sts"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_endpoint_type&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Interface"&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_ids&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnet_ids&lt;/span&gt;
  &lt;span class="nx"&gt;security_group_ids&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_endpoints&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;private_dns_enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The interface endpoints cost about $7.20/month per AZ ($21.60 total across 3 AZs, per endpoint). The S3 and DynamoDB gateway endpoints are free. At $0.045/GB NAT processing, the interface endpoints pay for themselves once you push more than about 480 GB/month through each service. ECR image pulls alone blow past that in most clusters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A nuance worth understanding:&lt;/strong&gt; ECR uses two separate service endpoints. The &lt;code&gt;ecr.api&lt;/code&gt; endpoint handles authentication and manifest resolution (low-volume API calls). The &lt;code&gt;ecr.dkr&lt;/code&gt; endpoint handles the actual image layer downloads, which is where the bulk of the data transfer happens. If you still have a NAT Gateway (and you likely do for other internet-bound traffic), you don't strictly need both. The &lt;code&gt;ecr.dkr&lt;/code&gt; endpoint alone captures the majority of the cost savings because that is where the gigabytes flow. The &lt;code&gt;ecr.api&lt;/code&gt; calls are small enough that routing them through NAT is negligible. I include both in the repo for completeness, but if you're watching your interface endpoint spend, &lt;code&gt;ecr.dkr&lt;/code&gt; plus the free S3 gateway endpoint are the ones that move the needle. ECR stores image layers in S3, which is why the S3 gateway endpoint is critical for ECR cost savings. The STS endpoint is shown in the code above. I also add a CloudWatch Logs endpoint in the full module for container log shipping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Another gotcha:&lt;/strong&gt; the security group on your interface endpoints must allow inbound HTTPS (443) from your VPC CIDR. Skip that and you get timeouts that look like DNS failures. Here is what the security group looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_security_group"&lt;/span&gt; &lt;span class="s2"&gt;"vpc_endpoints"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name_prefix&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.cluster_name}-vpce-"&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Security group for VPC interface endpoints"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;

  &lt;span class="nx"&gt;ingress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"HTTPS from VPC"&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_cidr&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows the entire VPC CIDR, which is fine for most setups. In a tighter security posture, scope the source to your private subnet CIDRs or reference the cluster's security group ID directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  EKS Auto Mode on Kubernetes 1.35
&lt;/h3&gt;

&lt;p&gt;I used Kubernetes 1.35, the latest version available on EKS as of March 2026. Traffic Distribution was introduced as alpha in 1.30 with &lt;code&gt;PreferClose&lt;/code&gt;, which graduated to GA in 1.33. In 1.34, &lt;code&gt;PreferSameZone&lt;/code&gt; was introduced as a clearer replacement and &lt;code&gt;PreferClose&lt;/code&gt; became a deprecated alias. Both &lt;code&gt;PreferSameZone&lt;/code&gt; and &lt;code&gt;PreferSameNode&lt;/code&gt; graduated to stable in 1.35. More on both below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"eks"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-aws-modules/eks/aws"&lt;/span&gt;
  &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 21.0"&lt;/span&gt;

  &lt;span class="nx"&gt;name&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cluster_name&lt;/span&gt;
  &lt;span class="nx"&gt;kubernetes_version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"1.35"&lt;/span&gt;

  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_ids&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnet_ids&lt;/span&gt;

  &lt;span class="c1"&gt;# When node_pools are specified, the module automatically creates&lt;/span&gt;
  &lt;span class="c1"&gt;# the required Auto Mode IAM resources.&lt;/span&gt;
  &lt;span class="nx"&gt;compute_config&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;enabled&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="nx"&gt;node_pools&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"general-purpose"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;# Private access for in-VPC communication, public for kubectl access.&lt;/span&gt;
  &lt;span class="c1"&gt;# In production, restrict public access to specific CIDRs or disable it.&lt;/span&gt;
  &lt;span class="nx"&gt;endpoint_private_access&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;endpoint_public_access&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="c1"&gt;# Control plane logging for observability&lt;/span&gt;
  &lt;span class="nx"&gt;enabled_log_types&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"audit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"authenticator"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="c1"&gt;# Cross-AZ traffic visibility in the EKS console&lt;/span&gt;
  &lt;span class="nx"&gt;addons&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;amazon-cloudwatch-observability&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;most_recent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;# Fine for demos. For production, use scoped access entries with narrower policies.&lt;/span&gt;
  &lt;span class="nx"&gt;enable_cluster_creator_admin_permissions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tags&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also added a Graviton NodePool at weight 10 for better price-performance, using the same weighted NodePool pattern I described in the &lt;a href="https://darryl-ruggles.cloud/a-complete-terraform-setup-for-eks-auto-mode-is-it-right-for-you/" rel="noopener noreferrer"&gt;Auto Mode article&lt;/a&gt;. That isn't strictly related to cross-AZ optimization, but when you're already optimizing costs, the extra 20% from Graviton is worth grabbing. The full NodePool manifest is in &lt;code&gt;terraform/modules/eks/main.tf&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important Auto Mode detail:&lt;/strong&gt; both the NodePool and NodeClass API groups are &lt;code&gt;eks.amazonaws.com&lt;/code&gt; in Auto Mode, not &lt;code&gt;karpenter.k8s.aws&lt;/code&gt; as in self-managed Karpenter installations. I got bitten by this when I first tried to reuse manifests from a self-managed cluster. The resource silently fails to apply with the wrong API group.&lt;/p&gt;

&lt;h2&gt;
  
  
  Measuring Cross-AZ Traffic: Build a Tool, Don't Guess
&lt;/h2&gt;

&lt;p&gt;The AWS docs describe these routing strategies well enough, but they don't show you how to verify that they are actually working. I built two things for this:&lt;/p&gt;

&lt;h3&gt;
  
  
  A Zone-Aware Echo Server
&lt;/h3&gt;

&lt;p&gt;Instead of using a generic HTTP echo server, I wrote a small FastAPI app that returns its own AZ, node, and pod name in every response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/zone-echo/server.py
&lt;/span&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;echo&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;_get_zone&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NODE_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pod&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POD_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pod_ip&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POD_IP&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The zone is resolved at startup by querying the Kubernetes API for the node's &lt;code&gt;topology.kubernetes.io/zone&lt;/code&gt; label (IMDS is blocked on Auto Mode's Bottlerocket nodes). If you adapt this for non-Auto-Mode clusters, ensure IMDSv2 is enforced (&lt;code&gt;http_tokens = "required"&lt;/code&gt;) and set the hop limit appropriately. Each response tells you exactly where it came from with zero ambiguity.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Traffic Measurement Tool
&lt;/h3&gt;

&lt;p&gt;The second tool sends hundreds of requests to each service variant and counts how many stayed in-zone versus crossed AZ boundaries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/traffic-test/measure.py (abbreviated)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_requests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;my_zone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;my_node&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;RoutingStats&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RoutingStats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_requests&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;send_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;

        &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;latencies_ms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;resp_zone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;resp_zone&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;my_zone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;same_zone&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cross_zone&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it as a Kubernetes Job and check the logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; k8s/monitoring/traffic-test-job.yaml
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; cross-az-demo job/traffic-test
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;==============================================================================
CROSS-AZ TRAFFIC MEASUREMENT REPORT
==============================================================================
Source zone: us-east-1a
Source node: i-07812b1c2b6ed6405

Service                        Reqs  Same-Zone  Cross-AZ  Errors  Avg(ms)  P99(ms)
--------------------------------------------------------------------------------
backend-baseline               5000     49.9%     50.1%       0     2.4      4.2
backend-prefer-same-zone       5000    100.0%      0.0%       0     1.8      3.1
backend-prefer-same-node       5000    100.0%      0.0%       0     1.6      2.5
backend-topology-aware         5000    100.0%      0.0%       0     1.5      2.5

--- Zone Distribution per Service ---
  backend-baseline: us-east-1a: 2497, us-east-1b: 827, us-east-1c: 1676
  backend-prefer-same-zone: us-east-1a: 5000
  backend-prefer-same-node: us-east-1a: 5000
  backend-topology-aware: us-east-1a: 5000

--- Cost Estimate (at 10 TB/month east-west traffic) ---
  backend-baseline: $100.12/month (5,006 GB cross-AZ)
  backend-prefer-same-zone: $0.00/month (0 GB cross-AZ)
  backend-prefer-same-node: $0.00/month (0 GB cross-AZ)
  backend-topology-aware: $0.00/month (0 GB cross-AZ)

  Best improvement: backend-prefer-same-zone
  Cross-AZ reduction: 100.0%
  Estimated monthly savings: $100.12
==============================================================================
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The baseline came in at 50% cross-AZ, not the theoretical 67%. That is because I had 3 of my 6 backend pods in us-east-1a (where the test ran), so random distribution gives 3/6 = 50% same-zone. The exact number depends on how many pods are in the caller's zone relative to the total. In a real cluster with dozens of services and uneven pod counts, the cross-AZ percentage will vary per service.&lt;/p&gt;

&lt;p&gt;The important result: all three optimized strategies hit 100% same-zone across 5,000 requests. Zero cross-AZ leakage. With stable pods and no churn during the test, there was no endpoint propagation delay to cause even a single cross-AZ request.&lt;/p&gt;

&lt;p&gt;PreferSameNode edged out PreferSameZone on latency (1.6ms average versus 1.8ms) because it routes to pods on the same node, not just the same zone. The baseline's 2.4ms average includes the cross-AZ network hop penalty.&lt;/p&gt;

&lt;p&gt;This is why I say "build a tool, don't guess." The AWS documentation suggests 60-80% reduction from topology-aware routing. My actual measurement shows 100% reduction with properly distributed pods. That gap is significant enough to change how you think about the ROI.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Happens During Pod Churn?
&lt;/h3&gt;

&lt;p&gt;Stable pods are one thing. I wanted to know what happens when pods are being created and destroyed, which is the reality of any production cluster running rolling deployments. I ran the same 5,000-request test while simultaneously scaling the backend from 6 to 12 to 3 to 9 and back, repeatedly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=== RESULTS DURING POD CHURN ===

Service                        Reqs  Same-Zone  Cross-AZ  Errors  Avg(ms)  P99(ms)
--------------------------------------------------------------------------------
backend-baseline               5000     54.7%     42.3%     151     2.6      6.3
backend-prefer-same-zone       5000    100.0%      0.0%       0     1.7      3.5
backend-prefer-same-node       5000    100.0%      0.0%       0     1.7      5.4
backend-topology-aware         5000    100.0%      0.0%       0     1.8      5.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All three optimized strategies held at 100% same-zone during churn. Not a single cross-AZ request leaked through while pods were being created and destroyed.&lt;/p&gt;

&lt;p&gt;The baseline told a different story: 151 errors (3%) from requests hitting endpoints that were mid-termination. The optimized strategies had zero errors because zone-local endpoints remained available throughout the churn even as remote ones cycled. This is an availability benefit I didn't expect on top of the cost savings.&lt;/p&gt;

&lt;p&gt;P99 latency did increase under churn. PreferSameNode went from 2.5ms to 5.4ms as the endpoint list updated. PreferSameZone stayed more stable at 3.5ms. That's expected and acceptable.&lt;/p&gt;

&lt;p&gt;One thing I got wrong in my initial prediction: I expected Topology Aware Routing (TAR) to drop hints during the churn and leak cross-AZ traffic. It did not. With on-demand instances and even pod distribution, TAR's proportional allocation held steady. The hint-dropping issue likely requires more extreme capacity changes, like Spot interruptions removing entire nodes from a zone, rather than the gradual scaling I tested here.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Routing Strategies: What Actually Works
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe56fjo7o6hs3uoj3kte9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe56fjo7o6hs3uoj3kte9.png" alt="Optimization strategies overview" width="800" height="1874"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I tested five approaches. Here is my honest ranking.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Traffic Distribution: PreferSameZone (Use This One)
&lt;/h3&gt;

&lt;p&gt;This is the winner. One field on your Service spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend-prefer-same-zone&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cross-az-demo&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterIP&lt;/span&gt;
  &lt;span class="na"&gt;trafficDistribution&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PreferSameZone&lt;/span&gt;  &lt;span class="c1"&gt;# stable in K8s 1.35, replaces deprecated PreferClose&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. One line. kube-proxy routes to same-zone endpoints first. If no local endpoints exist, it falls back to any available endpoint. &lt;code&gt;PreferSameZone&lt;/code&gt; was introduced in 1.34 and graduated to stable in 1.35. No feature gates, no annotation. It is a first-class spec field.&lt;/p&gt;

&lt;p&gt;I recommend this as the default for every ClusterIP Service in your cluster unless you have a specific reason not to use it. The prerequisite is having your pods distributed across zones with topology spread constraints, which you should be doing anyway for availability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about AZ failures?&lt;/strong&gt; During an AZ outage, pods in the failed zone lose their local endpoints. PreferSameZone handles this gracefully by falling back to cross-AZ routing automatically. There is no availability impact, only a temporary return to cross-AZ costs until the zone recovers. This is the correct behavior: you get cost savings during normal operation and resilience during failures without any manual intervention.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Topology Aware Routing (It Works, But PreferSameZone is Better)
&lt;/h3&gt;

&lt;p&gt;The older approach, using an annotation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;service.kubernetes.io/topology-mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Auto&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TAR uses a more complex algorithm: the EndpointSlice controller allocates endpoints proportionally across zones and sets "hints" for each one. The idea is to balance both locality and load distribution.&lt;/p&gt;

&lt;p&gt;In my testing, it works. The numbers are close to PreferSameZone. But I found two problems that make me recommend PreferSameZone instead:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 1: Spot instances break it.&lt;/strong&gt; TAR recalculates proportional allocation based on zone capacity. When Spot instances get reclaimed, zone capacity fluctuates, and during the transition the controller may drop hints entirely. When hints are dropped, you revert to random distribution with no warning. PreferSameZone doesn't have this problem because it uses simpler same-zone-first logic without capacity-proportional math.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 2: The annotation is deprecated.&lt;/strong&gt; The &lt;code&gt;service.kubernetes.io/topology-mode: Auto&lt;/code&gt; annotation won't graduate to GA. The underlying mechanism (EndpointSlice hints) is GA, but the recommended path forward is the &lt;code&gt;trafficDistribution&lt;/code&gt; field (&lt;code&gt;PreferSameZone&lt;/code&gt;), which is stable in 1.35. There's no reason to use the deprecated annotation when the replacement is available and simpler.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. PreferSameNode: The Best of Both Worlds (New in 1.35)
&lt;/h3&gt;

&lt;p&gt;Kubernetes 1.35 added a new Traffic Distribution option that solves a real gap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;trafficDistribution&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PreferSameNode&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This routes to endpoints on the same &lt;em&gt;node&lt;/em&gt; first, then falls back to same-zone, then cluster-wide. It gives you the latency benefits of &lt;code&gt;internalTrafficPolicy: Local&lt;/code&gt; but with a graceful fallback instead of dropped traffic.&lt;/p&gt;

&lt;p&gt;This is the option I wish had existed from the start. It is strictly better than &lt;code&gt;internalTrafficPolicy: Local&lt;/code&gt; for any use case where you want node-local preference but cannot tolerate dropped requests. Stable since 1.35.&lt;/p&gt;

&lt;p&gt;I recommend this for latency-sensitive service pairs that are often (but not always) co-located. For pure cost optimization, &lt;code&gt;PreferSameZone&lt;/code&gt; is sufficient. &lt;code&gt;PreferSameNode&lt;/code&gt; adds value when sub-millisecond latency between specific services matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Internal Traffic Policy: Local (The Nuclear Option)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;internalTrafficPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Local&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This restricts traffic to endpoints on the same &lt;em&gt;node&lt;/em&gt;, not just the same zone. Zero cross-AZ traffic is possible. Zero cross-node traffic either.&lt;/p&gt;

&lt;p&gt;I call this the nuclear option because if no endpoint exists on the calling pod's node, &lt;strong&gt;the traffic is dropped.&lt;/strong&gt; Not rerouted or retried - just dropped. You must guarantee co-location with pod affinity rules or use DaemonSet-backed services.&lt;/p&gt;

&lt;p&gt;I proved this during testing. I created per-AZ services with &lt;code&gt;internalTrafficPolicy: Local&lt;/code&gt; and ran a test from zone us-east-1a against the zone-b service. The test hung indefinitely. Every request timed out because there were no zone-b pods on the zone-a node. There were no errors and no fallbacks - just a silent failure.&lt;/p&gt;

&lt;p&gt;When it does work (when a backend pod exists on the same node), the latency is the best of any strategy: 1.6ms average in my testing, versus 1.8ms for PreferSameZone. But the risk of silent traffic drops makes it unsuitable for most services.&lt;/p&gt;

&lt;p&gt;I use this for exactly two situations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DaemonSet services (every node has a local endpoint by definition)&lt;/li&gt;
&lt;li&gt;Tightly coupled pod pairs with strict &lt;code&gt;requiredDuringSchedulingIgnoredDuringExecution&lt;/code&gt; affinity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For everything else, PreferSameZone is safer and almost as effective. And now with &lt;code&gt;PreferSameNode&lt;/code&gt; in 1.35, there is even less reason to use &lt;code&gt;Local&lt;/code&gt;. You get the same node-local preference with a graceful fallback.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Per-AZ Deployments (When PreferSameZone Is Not Enough)
&lt;/h3&gt;

&lt;p&gt;There is a subtle scaling problem that PreferSameZone doesn't solve. Say traffic to your backend spikes in us-east-1a. The HPA scales the Deployment from 6 to 12 replicas. Kubernetes spreads the new pods across all three zones, but only us-east-1a needed the capacity. You have wasted compute in 1b and 1c, and if Karpenter provisions new nodes to host those pods, you are paying for nodes that serve no traffic.&lt;/p&gt;

&lt;p&gt;The fix is per-AZ Deployments with independent HPAs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend-az-a&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend-zonal&lt;/span&gt;
      &lt;span class="na"&gt;zone&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;az-a&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;affinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;nodeAffinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;requiredDuringSchedulingIgnoredDuringExecution&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;nodeSelectorTerms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;matchExpressions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;topology.kubernetes.io/zone&lt;/span&gt;
                    &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;In&lt;/span&gt;
                    &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1a"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;autoscaling/v2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HorizontalPodAutoscaler&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend-az-a&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scaleTargetRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend-az-a&lt;/span&gt;
  &lt;span class="na"&gt;minReplicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="na"&gt;maxReplicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Resource&lt;/span&gt;
      &lt;span class="na"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cpu&lt;/span&gt;
        &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Utilization&lt;/span&gt;
          &lt;span class="na"&gt;averageUtilization&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;70&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multiply by three AZs. You get triple the manifests and triple the HPA objects to monitor. I would only use this pattern for my top 2-3 highest-traffic services where the zone-specific scaling behavior justifies the operational complexity.&lt;/p&gt;

&lt;p&gt;The full 3-AZ manifest is in &lt;code&gt;k8s/cross-az-optimization/05-per-az-deployment.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Load Balancer: Switch to IP Mode
&lt;/h3&gt;

&lt;p&gt;For services fronted by an ALB, switch the target type from "instance" to "ip":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;alb.ingress.kubernetes.io/target-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ip&lt;/span&gt;
    &lt;span class="na"&gt;alb.ingress.kubernetes.io/scheme&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;internet-facing&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In instance mode, traffic flows: ALB - NodePort (any node) - kube-proxy - Pod (maybe different AZ). That is up to two cross-AZ hops. In IP mode: ALB - Pod IP directly. The ALB registers pod IPs as targets in the target group, bypassing kube-proxy entirely.&lt;/p&gt;

&lt;p&gt;Combine this with &lt;code&gt;trafficDistribution: PreferSameZone&lt;/code&gt; on the backing Service for the best result.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxvarbr38ww94ur3ohsxx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxvarbr38ww94ur3ohsxx.png" alt="Cross-AZ traffic after optimization" width="800" height="857"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What the AWS Documentation Doesn't Tell You
&lt;/h2&gt;

&lt;p&gt;I ran into several things during testing that aren't obvious from the official docs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Topology spread constraints need &lt;code&gt;minDomains&lt;/code&gt; with Auto Mode.&lt;/strong&gt; This one bit me during testing. I set &lt;code&gt;topologySpreadConstraints&lt;/code&gt; with &lt;code&gt;maxSkew: 1&lt;/code&gt; and &lt;code&gt;whenUnsatisfiable: DoNotSchedule&lt;/code&gt; on my Deployment - the standard recipe from every example out there. But with EKS Auto Mode, you start from zero nodes. Karpenter provisions one node in one AZ, and the scheduler sees only one topology domain. A &lt;code&gt;maxSkew&lt;/code&gt; of 1 across a single domain is always satisfied, so all 6 pods land on that one node in one AZ.&lt;/p&gt;

&lt;p&gt;The fix is &lt;code&gt;minDomains: 3&lt;/code&gt;. This tells the scheduler that at least 3 zones must exist before scheduling is satisfied, which forces Karpenter to provision nodes in other AZs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;topologySpreadConstraints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;maxSkew&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;minDomains&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
    &lt;span class="na"&gt;topologyKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;topology.kubernetes.io/zone&lt;/span&gt;
    &lt;span class="na"&gt;whenUnsatisfiable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DoNotSchedule&lt;/span&gt;
    &lt;span class="na"&gt;labelSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most topology spread examples assume you already have nodes in multiple AZs. With Auto Mode and Karpenter, you do not. Without &lt;code&gt;minDomains&lt;/code&gt;, your pods will quietly consolidate into a single zone while the topology constraint reports no violations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The EKS console has built-in cross-AZ visibility now.&lt;/strong&gt; Enable the &lt;code&gt;amazon-cloudwatch-observability&lt;/code&gt; addon (included in the Terraform code above) and you get a service map in the EKS console showing cross-AZ flows with data volume. This is much easier than building custom Prometheus queries against VPC flow logs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VPC endpoint interface charges don't include cross-AZ data transfer.&lt;/strong&gt; This is buried in a "What's New" announcement from 2022. Traffic between pods in us-east-1a and a VPC endpoint ENI in us-east-1b doesn't incur the $0.01/GB cross-AZ charge. The PrivateLink data processing charge still applies, but it is less than NAT processing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PreferSameZone occasionally sends cross-AZ during endpoint updates.&lt;/strong&gt; When pods are added or removed, there's a brief window where EndpointSlice hints haven't propagated. During this window, kube-proxy may route to any endpoint. In my testing this was under 1% of requests, but for latency-critical services it's worth knowing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CoreDNS is a cross-AZ blind spot.&lt;/strong&gt; DNS lookups via the &lt;code&gt;kube-dns&lt;/code&gt; Service are themselves subject to cross-AZ routing, and as of this writing you can't set &lt;code&gt;trafficDistribution&lt;/code&gt; on the CoreDNS addon service directly. There's an &lt;a href="https://github.com/aws/containers-roadmap/issues/2628" rel="noopener noreferrer"&gt;open EKS feature request&lt;/a&gt; for this. If DNS latency matters to you, this is a gap to watch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Finding cross-AZ charges in Cost Explorer.&lt;/strong&gt; Beyond the EKS console's network observability, you can identify cross-AZ charges in AWS Cost Explorer by filtering for the &lt;code&gt;DataTransfer-Regional-Bytes&lt;/code&gt; and &lt;code&gt;NatGateway-Bytes&lt;/code&gt; usage types. These are the line items that this article's optimizations target.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Considerations
&lt;/h2&gt;

&lt;p&gt;This article focuses on FinOps, but if you are taking these patterns into production, a few security points are worth noting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cluster access:&lt;/strong&gt; The demo uses &lt;code&gt;enable_cluster_creator_admin_permissions = true&lt;/code&gt; for simplicity. For production, define scoped access entries with narrower policies (e.g., &lt;code&gt;AmazonEKSViewPolicy&lt;/code&gt; for read-only users, &lt;code&gt;AmazonEKSClusterAdminPolicy&lt;/code&gt; only for platform teams).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workload identity:&lt;/strong&gt; EKS Auto Mode with module v21+ uses Pod Identity by default. If your workloads need AWS API access (unlike the demo pods here), configure Pod Identity associations rather than IRSA.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC endpoint security groups:&lt;/strong&gt; The demo allows all VPC CIDR traffic on port 443. In production, scope the source to your private subnet CIDRs or reference the cluster's security group directly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network ACLs:&lt;/strong&gt; The default NACL allows all traffic. Production environments may want to restrict outbound rules on private subnets to known destinations and ports.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IMDSv2:&lt;/strong&gt; Auto Mode uses Bottlerocket, which enforces IMDSv2. If you adapt this code for self-managed node groups with Amazon Linux, ensure IMDSv2 is enforced (&lt;code&gt;http_tokens = "required"&lt;/code&gt; in your launch template) and set &lt;code&gt;http_put_response_hop_limit = 2&lt;/code&gt; for containerized workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encryption at rest:&lt;/strong&gt; The demo doesn't configure a custom KMS key for EKS secrets encryption. For production, enable envelope encryption with &lt;code&gt;encryption_config&lt;/code&gt; in the EKS module to encrypt Kubernetes secrets at rest with a customer-managed key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encryption in transit:&lt;/strong&gt; Traffic between pods is unencrypted by default. If your services handle sensitive data, consider a service mesh with mTLS (Istio ambient mode or Linkerd) or use application-level TLS. Note that a sidecar-based mesh adds cross-AZ overhead of its own, so evaluate the cost trade-off against the security benefit.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What This Costs and What You Save
&lt;/h2&gt;

&lt;p&gt;Here is my cost analysis for a cluster with 10 TB/month east-west traffic and 2 TB/month AWS service traffic (ECR, S3, CloudWatch):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;East-west cross-AZ&lt;/td&gt;
&lt;td&gt;$100/month&lt;/td&gt;
&lt;td&gt;$0/month&lt;/td&gt;
&lt;td&gt;-$100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NAT processing (ECR/S3/logs)&lt;/td&gt;
&lt;td&gt;$90/month&lt;/td&gt;
&lt;td&gt;$0/month&lt;/td&gt;
&lt;td&gt;-$90&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NAT cross-AZ (single NAT)&lt;/td&gt;
&lt;td&gt;$13/month&lt;/td&gt;
&lt;td&gt;$0/month&lt;/td&gt;
&lt;td&gt;-$13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VPC interface endpoints (4x3 AZs)&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$86/month&lt;/td&gt;
&lt;td&gt;+$86&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Additional NAT Gateways (2 extra)&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;$65/month&lt;/td&gt;
&lt;td&gt;+$65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Net change&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$236/month&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$152/month&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-$84/month&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At 10 TB/month, the savings are modest at about $84/month. But cross-AZ costs scale linearly with traffic. At 50 TB/month east-west, the east-west savings alone hit $650/month. At 100 TB/month, over $1,300/month. The infrastructure costs (VPC endpoints, extra NATs) stay flat.&lt;/p&gt;

&lt;p&gt;The real point is that the traffic routing changes (PreferSameZone on Services and IP mode on Ingress) are free. There is zero infrastructure cost and only one line of YAML per Service. Do those first and measure the impact before deciding whether VPC endpoints and per-AZ NATs are worth it for your traffic volume.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Recommended Order of Operations
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add VPC endpoints for S3 (free), ECR, STS, and CloudWatch Logs.&lt;/strong&gt; This is the biggest win for most teams. The S3 and DynamoDB gateway endpoints are free, and the ECR interface endpoint eliminates NAT processing charges ($0.045/GB) on every container image pull. If your cluster pulls images regularly, this alone can save more than all the other optimizations combined.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add &lt;code&gt;trafficDistribution: PreferSameZone&lt;/code&gt; to all ClusterIP Services.&lt;/strong&gt; The highest-impact change for east-west traffic. It's a single field (stable in K8s 1.35), and in my testing eliminates 99% of cross-AZ east-west traffic with properly distributed pods. For latency-sensitive pairs, consider &lt;code&gt;PreferSameNode&lt;/code&gt; (stable in 1.35).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add topology spread constraints to all Deployments.&lt;/strong&gt; PreferSameZone only works if pods are actually distributed across zones. If they aren't, this is your real first step.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch ALB Ingress resources to IP target mode.&lt;/strong&gt; Another free change. One annotation eliminates the NodePort-to-pod cross-AZ hop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy one NAT Gateway per AZ.&lt;/strong&gt; Worth it when your cross-AZ NAT traffic exceeds ~1.6 TB/month per AZ.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consider per-AZ Deployments for your top 2-3 highest-traffic services.&lt;/strong&gt; Only if you have zone-specific scaling problems that PreferSameZone doesn't solve.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;The complete code is in the &lt;a href="https://github.com/RDarrylR/eks-cross-az-exploration" rel="noopener noreferrer"&gt;companion repository&lt;/a&gt;. Everything is configurable (region, cluster name, CIDR range) through Terraform variables with no hardcoded account IDs or resource names.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; AWS CLI configured, Terraform 1.10+, Docker, kubectl, Python 3.13+.&lt;/p&gt;

&lt;p&gt;The fastest path is the one-command setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/darryl-ruggles/eks-cross-az-exploration
&lt;span class="nb"&gt;cd &lt;/span&gt;eks-cross-az-exploration

&lt;span class="c"&gt;# Deploy everything and run the measurement (takes ~15 minutes)&lt;/span&gt;
make setup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That runs Terraform, builds and pushes container images to ECR, deploys the demo app with all service variants, and executes the traffic measurement job. At the end you get the comparison report.&lt;/p&gt;

&lt;p&gt;If you prefer to step through it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Deploy infrastructure (VPC, EKS, VPC endpoints, ECR repos)&lt;/span&gt;
make infra

&lt;span class="c"&gt;# 2. Configure kubectl&lt;/span&gt;
make kubeconfig

&lt;span class="c"&gt;# 3. Build and push container images&lt;/span&gt;
make images

&lt;span class="c"&gt;# 4. Deploy the demo app and all routing strategies&lt;/span&gt;
make deploy

&lt;span class="c"&gt;# 5. Run the cross-AZ measurement&lt;/span&gt;
make &lt;span class="nb"&gt;test&lt;/span&gt;

&lt;span class="c"&gt;# 6. See pod distribution across AZs&lt;/span&gt;
make show-distribution

&lt;span class="c"&gt;# 7. Run again with PreferSameNode included&lt;/span&gt;
make test-all

&lt;span class="c"&gt;# 8. Clean up when done&lt;/span&gt;
make teardown
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All configuration is overridable: &lt;code&gt;make test NUM_REQUESTS=1000&lt;/code&gt;, &lt;code&gt;make infra&lt;/code&gt; (uses variables.tf defaults; edit &lt;code&gt;region&lt;/code&gt;, &lt;code&gt;cluster_name&lt;/code&gt;, &lt;code&gt;vpc_cidr&lt;/code&gt; as needed).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IMPORTANT:&lt;/strong&gt; This deploys real AWS resources that cost money. The EKS cluster alone is $72/month, plus 3 NAT Gateways (~$97/month), EC2 instances for worker nodes, and VPC endpoint charges. Run &lt;code&gt;make teardown&lt;/code&gt; when you're done testing. If &lt;code&gt;terraform destroy&lt;/code&gt; fails with dangling ENIs (which can happen if VPC endpoints are destroyed before EKS releases its network interfaces), wait a few minutes and retry.&lt;/p&gt;

&lt;p&gt;The measurement report gives you actual numbers for your cluster, which matters more than any estimate in a blog post.&lt;/p&gt;

&lt;p&gt;Cross-AZ data transfer is one of those costs that seems small on any individual request but compounds into a meaningful line item at scale. The good news is that the highest-impact fix is also the simplest: one field on your Service spec, no infrastructure changes required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/eks/latest/best-practices/cost-opt-networking.html" rel="noopener noreferrer"&gt;EKS Best Practices - Cost Optimization Networking&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/reference/networking/virtual-ips/#traffic-distribution" rel="noopener noreferrer"&gt;Kubernetes Traffic Distribution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/concepts/services-networking/topology-aware-routing/" rel="noopener noreferrer"&gt;Kubernetes Topology Aware Routing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/docs/concepts/services-networking/service-traffic-policy/" rel="noopener noreferrer"&gt;Kubernetes Service Internal Traffic Policy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/containers/getting-visibility-into-your-amazon-eks-cross-az-pod-to-pod-network-bytes/" rel="noopener noreferrer"&gt;Getting Visibility into Amazon EKS Cross-AZ Pod to Pod Network Bytes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/network-observability.html" rel="noopener noreferrer"&gt;Container Network Observability on EKS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-endpoints.html" rel="noopener noreferrer"&gt;Amazon ECR Interface VPC Endpoints&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html" rel="noopener noreferrer"&gt;Gateway Endpoints for Amazon S3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/privatelink/pricing/" rel="noopener noreferrer"&gt;AWS PrivateLink Pricing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/ec2/pricing/on-demand/#Data_Transfer" rel="noopener noreferrer"&gt;AWS Data Transfer Pricing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions-standard.html" rel="noopener noreferrer"&gt;Kubernetes 1.35 Release Notes - PreferSameNode Stable&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/4444-service-traffic-distribution" rel="noopener noreferrer"&gt;KEP-3015 - Traffic Distribution for Services&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/automode.html" rel="noopener noreferrer"&gt;EKS Auto Mode Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest/examples/eks-auto-mode" rel="noopener noreferrer"&gt;terraform-aws-modules/eks v21 Auto Mode Example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/eks/pricing/" rel="noopener noreferrer"&gt;Amazon EKS Pricing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/RDarrylR/eks-cross-az-exploration" rel="noopener noreferrer"&gt;Full Companion Code - GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://darryl-ruggles.cloud/a-complete-terraform-setup-for-eks-auto-mode-is-it-right-for-you/" rel="noopener noreferrer"&gt;A Complete Terraform Setup for EKS Auto Mode&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Connect with me on &lt;a href="https://x.com/darrylruggles" rel="noopener noreferrer"&gt;X&lt;/a&gt;, &lt;a href="https://bsky.app/profile/darrylruggles.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;, &lt;a href="https://linkedin.com/in/darrylruggles" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://github.com/darryl-ruggles" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, &lt;a href="https://medium.com/@darrylruggles" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;, &lt;a href="https://dev.to/darrylruggles"&gt;Dev.to&lt;/a&gt;, or the &lt;a href="https://community.aws/@darrylruggles" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;. Check out more of my projects at &lt;a href="https://darryl-ruggles.cloud" rel="noopener noreferrer"&gt;darryl-ruggles.cloud&lt;/a&gt; and join the &lt;a href="https://believeinserverless.com" rel="noopener noreferrer"&gt;Believe In Serverless&lt;/a&gt; community.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>eks</category>
      <category>finops</category>
      <category>vpc</category>
    </item>
    <item>
      <title>AWS Lambda Now Knows Where It Lives: Exploring the New AZ Metadata</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Fri, 27 Mar 2026 01:30:21 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-lambda-now-knows-where-it-lives-exploring-the-new-az-metadata-4be0</link>
      <guid>https://forem.com/aws-builders/aws-lambda-now-knows-where-it-lives-exploring-the-new-az-metadata-4be0</guid>
      <description>&lt;p&gt;Lambda was the last holdout. EC2 has had its Instance Metadata Service since the beginning. ECS and Fargate have their task metadata endpoints. But Lambda? If your function needed to know which Availability Zone it was running in, you were out of luck - unless you enjoyed building elaborate workarounds with private API Gateway endpoints and &lt;code&gt;DescribeNetworkInterfaces&lt;/code&gt; calls.&lt;/p&gt;

&lt;p&gt;On March 19, 2026, AWS quietly fixed this with a &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/03/lambda-availability-zone-metadata/" rel="noopener noreferrer"&gt;new metadata endpoint&lt;/a&gt; that exposes the AZ ID of the execution environment. No VPC is required, no IAM permissions are needed, and there are no API calls to external services. It's just a localhost HTTP endpoint that returns your AZ in under a millisecond.&lt;/p&gt;

&lt;p&gt;I built a small demo app to explore three practical uses for this new metadata, using Powertools for AWS Lambda for observability throughout (I wrote about &lt;a href="https://darryl-ruggles.cloud/powertools-for-aws-lambda-best-practices-by-default/" rel="noopener noreferrer"&gt;Powertools best practices&lt;/a&gt; previously). The code is &lt;a href="https://github.com/RDarrylR/lambda-az-metadata" rel="noopener noreferrer"&gt;on GitHub&lt;/a&gt; if you want to deploy it yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's in the Box
&lt;/h2&gt;

&lt;p&gt;Lambda now sets two new reserved environment variables in every execution environment:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variable&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AWS_LAMBDA_METADATA_API&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Address of the metadata server (e.g., &lt;code&gt;169.254.100.1:9001&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AWS_LAMBDA_METADATA_TOKEN&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Auth token unique to this execution environment&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You make a GET request to the endpoint with the token as a Bearer header, and you get back your AZ ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET http://${AWS_LAMBDA_METADATA_API}/2026-01-15/metadata/execution-environment
Authorization: Bearer ${AWS_LAMBDA_METADATA_TOKEN}

{
  "AvailabilityZoneID": "use1-az1"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The response is cached for 12 hours and marked &lt;code&gt;immutable&lt;/code&gt; - it never changes within an execution environment.&lt;/p&gt;

&lt;p&gt;Powertools for AWS Lambda makes it even simpler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools.utilities.metadata&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_lambda_metadata&lt;/span&gt;

&lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_lambda_metadata&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;az_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;availability_zone_id&lt;/span&gt;  &lt;span class="c1"&gt;# "use1-az1"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Powertools handles the caching and SnapStart invalidation automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  AZ IDs vs AZ Names
&lt;/h3&gt;

&lt;p&gt;One important detail: the metadata returns &lt;strong&gt;AZ IDs&lt;/strong&gt; (&lt;code&gt;use1-az1&lt;/code&gt;), not AZ names (&lt;code&gt;us-east-1a&lt;/code&gt;). This matters because AZ IDs are consistent across all AWS accounts - &lt;code&gt;use1-az1&lt;/code&gt; is always the same physical data center regardless of which account you're in. AZ names are mapped differently per account to distribute load, so my &lt;code&gt;us-east-1a&lt;/code&gt; might be a completely different physical zone than yours.&lt;/p&gt;

&lt;p&gt;If you need the AZ name for your account, you can resolve it with a single EC2 API call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ec2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ec2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe_availability_zones&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zone-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Values&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;use1-az1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;az_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AvailabilityZones&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ZoneName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# "us-east-1a"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cache this - it does not change.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Demo App
&lt;/h2&gt;

&lt;p&gt;I built a single Lambda function with three API routes using &lt;a href="https://docs.powertools.aws.dev/lambda/python/latest/core/event_handler/api_gateway/" rel="noopener noreferrer"&gt;Powertools Event Handler&lt;/a&gt;, deployed with SAM. Each route demonstrates a different use case for the new AZ metadata.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzb6fggrt3wvrpmknqj8y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzb6fggrt3wvrpmknqj8y.png" alt="Architecture Diagram" width="800" height="1072"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API Gateway&lt;/td&gt;
&lt;td&gt;REST API with X-Ray tracing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda (Python 3.13, ARM64)&lt;/td&gt;
&lt;td&gt;Single function, 3 routes via Powertools Event Handler&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Powertools Layer&lt;/td&gt;
&lt;td&gt;Logger, Tracer, Metrics, Event Handler, Lambda Metadata&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch Dashboard&lt;/td&gt;
&lt;td&gt;Invocations, cold starts, routing, and chaos metrics by AZ&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Foundation: AZ-Enriched Observability
&lt;/h3&gt;

&lt;p&gt;Before getting to the three routes, the most important pattern is in the module initialization and handler. I resolve the AZ ID once during cold start and set it as a default metric dimension:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Tracer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools.event_handler&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;APIGatewayRestResolver&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Tracer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Metrics&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;APIGatewayRestResolver&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Resolve AZ ID at module level - metadata endpoint is localhost, sub-millisecond
&lt;/span&gt;&lt;span class="n"&gt;AZ_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_init_az_id&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Every custom metric gets the AZ dimension automatically
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;AZ_ID&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_default_dimensions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AvailabilityZoneID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AZ_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in the handler, following the &lt;a href="https://darryl-ruggles.cloud/powertools-for-aws-lambda-best-practices-by-default/" rel="noopener noreferrer"&gt;Powertools three-decorator pattern&lt;/a&gt;, I inject the AZ ID into every log line and X-Ray trace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@logger.inject_lambda_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_event&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@tracer.capture_lambda_handler&lt;/span&gt;
&lt;span class="nd"&gt;@metrics.log_metrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;capture_cold_start_metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LambdaContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append_keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;availability_zone_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AZ_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_annotation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;availability_zone_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AZ_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every log line from this function includes the AZ:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INFO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Routing decision"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"availability_zone_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"use1-az4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"endpoint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cache-az4.internal:6379"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"same_az"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lambda-metadata-explorer"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During an AZ incident, you can filter to the affected zone instantly in CloudWatch Logs Insights:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fields @timestamp, availability_zone_id, message, level
| filter availability_zone_id = "use1-az4"
| filter level = "ERROR"
| sort @timestamp desc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;tracer.put_annotation&lt;/code&gt; call adds the AZ ID as an X-Ray annotation on every trace. You can filter traces by AZ in the X-Ray console using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;annotation.availability_zone_id = "use1-az4"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets you isolate all traces from a specific AZ during an incident. Click any trace segment and check the Annotations tab to see the AZ ID.&lt;/p&gt;

&lt;p&gt;This alone is worth the feature. Before March 19, you simply could not do this with Lambda.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Case 1: Exploring the Metadata (GET /metadata)
&lt;/h2&gt;

&lt;p&gt;The first route is straightforward - it returns the AZ metadata from both the Powertools utility and the raw HTTP endpoint, plus the resolved AZ name. This is mostly educational, showing developers both access methods side by side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@tracer.capture_method&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;explore_metadata&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;az_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_resolve_az_name&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# EC2 API call, cached after first use
&lt;/span&gt;    &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MetadataRequests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MetricUnit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;availability_zone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;az_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AZ_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;az_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;az_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;execution_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS_LAMBDA_FUNCTION_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS_REGION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_mb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS_LAMBDA_FUNCTION_MEMORY_SIZE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Seeing the Multi-AZ Distribution
&lt;/h3&gt;

&lt;p&gt;There's a subtlety here. If you call the endpoint 20 times sequentially, you'll likely see the same AZ every time. That's because Lambda reuses warm execution environments, and a warm instance stays in the same AZ for its lifetime.&lt;/p&gt;

&lt;p&gt;To actually see Lambda's multi-AZ distribution, you need to force it to spin up multiple execution environments simultaneously. The included test script fires 60 concurrent requests using &lt;code&gt;xargs -P&lt;/code&gt;, which overwhelms a single warm instance and forces Lambda to create new environments across different AZs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 60 total requests, 20 concurrent at a time&lt;/span&gt;
./scripts/test_az_distribution.sh &lt;span class="nv"&gt;$API_URL&lt;/span&gt; 20 60
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--- AZ Distribution ---

Count  AZ ID          AZ Name
-----  -------------- -------
22     use1-az4       us-east-1d
20     use1-az1       us-east-1a
18     use1-az6       us-east-1f

Unique AZs observed: 3
Total responses:     60
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lambda spreads execution environments across multiple AZs for resilience. Now you can actually see it happening, and with the AZ-enriched CloudWatch dashboard, you can monitor this distribution in real time.&lt;/p&gt;

&lt;p&gt;This is also why the chaos engineering use case (Use Case 3) needs concurrent traffic. If you only have one warm instance, targeting a specific AZ is hit-or-miss. With concurrent load, you'll have instances spread across AZs, and the ones in the target AZ will return 503 while the others continue serving normally - exactly how a real AZ outage would behave.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Case 2: Same-AZ Routing (POST /route)
&lt;/h2&gt;

&lt;p&gt;This is the use case with the most immediate dollar impact. Cross-AZ data transfer costs $0.01/GB in each direction. If you have a Lambda function making thousands of calls per second to an ElastiCache cluster or RDS read replica, and it's routing to whatever endpoint it gets, you're paying for cross-AZ traffic on roughly two-thirds of those calls (assuming 3 AZs).&lt;/p&gt;

&lt;p&gt;With the AZ metadata, you can route to the same-AZ endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@tracer.capture_method&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_request&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json_body&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="c1"&gt;# In production, load from SSM Parameter Store or service discovery
&lt;/span&gt;    &lt;span class="n"&gt;endpoint_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AZ_ENDPOINT_MAP&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;use1-az1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache-az1.internal:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;use1-az2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache-az2.internal:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;use1-az4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache-az4.internal:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;use1-az6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache-az6.internal:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;fallback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FALLBACK_ENDPOINT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache-primary.internal:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;selected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;endpoint_map&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AZ_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;same_az&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AZ_ID&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;endpoint_map&lt;/span&gt;

    &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RoutingDecisions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MetricUnit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_annotation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;routing_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;same-az&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;same_az&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cross-az&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;az_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AZ_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;routing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selected_endpoint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;same_az&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;same_az&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost_insight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cross_az_cost_per_gb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$0.01 per direction ($0.02 round-trip)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;same_az_cost_per_gb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$0.00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The endpoint map is a simple AZ-ID-to-endpoint mapping. In production, you'd load this from SSM Parameter Store using the &lt;a href="https://docs.powertools.aws.dev/lambda/python/latest/utilities/parameters/" rel="noopener noreferrer"&gt;Powertools Parameters&lt;/a&gt; utility with caching:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools.utilities&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;parameters&lt;/span&gt;

&lt;span class="n"&gt;endpoint_map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_parameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/myapp/az-endpoint-map&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For an ElastiCache cluster, you'd use the node endpoints for each AZ. For RDS with read replicas, you'd map to the replica in each AZ. The key insight is that you now know which AZ you're in and can make an informed routing decision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost math:&lt;/strong&gt; At 1 TB/month of data transfer with 3 AZs, roughly 67% crosses AZ boundaries. AWS charges $0.01/GB in each direction, so cross-AZ round-trips cost $0.02/GB effectively. That's ~$13.40/month in cross-AZ charges for this example. With same-AZ routing, it drops to near zero. The savings scale linearly with traffic volume.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Case 3: AZ-Aware Chaos Engineering (POST /chaos)
&lt;/h2&gt;

&lt;p&gt;The third route is about resilience testing. You specify a target AZ, and if the function happens to be running there, it injects a failure. This lets you test how your upstream systems - load balancers, retry logic, circuit breakers - handle an AZ going unhealthy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/chaos&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@tracer.capture_method&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chaos_test&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json_body&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;target_az&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_az&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;failure_mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failure_mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;target_az&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;AZ_ID&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;target_az&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ChaosFailuresInjected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MetricUnit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Chaos: injecting failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_az&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;target_az&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failure_mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;failure_mode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;failure_mode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delay_seconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;az_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AZ_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chaos&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;triggered&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;

        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ServiceError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Simulated AZ failure in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;AZ_ID&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;az_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AZ_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chaos&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;triggered&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_az&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;target_az&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two failure modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;error&lt;/code&gt;&lt;/strong&gt; - Returns HTTP 503. Tests whether callers retry and eventually hit a healthy AZ.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;latency&lt;/code&gt;&lt;/strong&gt; - Injects a delay (capped at 10 seconds). Tests whether callers have appropriate timeouts and circuit breakers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a real project, you'd combine this with an ALB or API Gateway with health checks. Target a specific AZ and observe whether traffic shifts to healthy AZs. The CloudWatch dashboard shows failures concentrated in the target AZ while other AZs continue serving normally - exactly what you'd see during a real AZ impairment.&lt;/p&gt;

&lt;p&gt;One important detail for testing: &lt;code&gt;make test-chaos&lt;/code&gt; auto-detects your current warm instance's AZ and targets it, guaranteeing a triggered failure. But for a more realistic simulation, fire concurrent traffic while targeting a specific AZ - some requests will fail (those in the target AZ) while others succeed (those in healthy AZs).&lt;/p&gt;

&lt;p&gt;Before this feature, you'd need to deploy different code per AZ or use external feature flags combined with some mechanism to map invocations to AZs. Now it's a simple string comparison.&lt;/p&gt;

&lt;h2&gt;
  
  
  The SAM Template
&lt;/h2&gt;

&lt;p&gt;The deployment is a straightforward SAM template - one function, one API, one CloudWatch dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Globals&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;Function&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Runtime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python3.13&lt;/span&gt;
    &lt;span class="na"&gt;Architectures&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;arm64&lt;/span&gt;
    &lt;span class="na"&gt;Timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
    &lt;span class="na"&gt;MemorySize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;256&lt;/span&gt;
    &lt;span class="na"&gt;Tracing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Active&lt;/span&gt;
    &lt;span class="na"&gt;LoggingConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;LogFormat&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;JSON&lt;/span&gt;

&lt;span class="na"&gt;Resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;AZMetadataFunction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless::Function&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;FunctionName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s"&gt;lambda-metadata-explorer-${Environment}&lt;/span&gt;
      &lt;span class="na"&gt;CodeUri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;src/&lt;/span&gt;
      &lt;span class="na"&gt;Handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app.lambda_handler&lt;/span&gt;
      &lt;span class="na"&gt;Layers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s"&gt;arn:aws:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-python313-arm64:${PowertoolsLayerVersion}&lt;/span&gt;
      &lt;span class="na"&gt;Environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;POWERTOOLS_SERVICE_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;lambda-metadata-explorer&lt;/span&gt;
          &lt;span class="na"&gt;POWERTOOLS_METRICS_NAMESPACE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AZMetadataExplorer&lt;/span&gt;
          &lt;span class="na"&gt;POWERTOOLS_LOG_LEVEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;LogLevel&lt;/span&gt;
          &lt;span class="na"&gt;POWERTOOLS_TRACER_CAPTURE_RESPONSE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;true'&lt;/span&gt;
          &lt;span class="na"&gt;POWERTOOLS_TRACER_CAPTURE_ERROR&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;true'&lt;/span&gt;
      &lt;span class="na"&gt;Policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Statement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
              &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ec2:DescribeAvailabilityZones&lt;/span&gt;
              &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
      &lt;span class="na"&gt;Events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;GetMetadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Api&lt;/span&gt;
          &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;Path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/metadata&lt;/span&gt;
            &lt;span class="na"&gt;Method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get&lt;/span&gt;
        &lt;span class="na"&gt;PostRoute&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Api&lt;/span&gt;
          &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;Path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/route&lt;/span&gt;
            &lt;span class="na"&gt;Method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;post&lt;/span&gt;
        &lt;span class="na"&gt;PostChaos&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Api&lt;/span&gt;
          &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;Path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/chaos&lt;/span&gt;
            &lt;span class="na"&gt;Method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;post&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ARM64 (Graviton) for better price-performance. Powertools via the AWS-managed Lambda Layer (plus pip bundling from &lt;code&gt;requirements.txt&lt;/code&gt; to ensure v3.26.0+ with the metadata utility is available even if the Layer hasn't been updated yet). X-Ray tracing active. JSON log format. The only extra IAM permission is &lt;code&gt;ec2:DescribeAvailabilityZones&lt;/code&gt; for the AZ name resolution - the metadata endpoint itself requires no permissions.&lt;/p&gt;

&lt;p&gt;The template also creates a CloudWatch dashboard with AZ distribution widgets. The AZ dimension comes from &lt;code&gt;metrics.set_default_dimensions(AvailabilityZoneID=AZ_ID)&lt;/code&gt; which tags every custom metric with the availability zone. The dashboard references explicit metric definitions for each AZ:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"AZMetadataExplorer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"MetadataRequests"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AvailabilityZoneID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"use1-az1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"service"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lambda-metadata-explorer"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One gotcha: the ColdStart metric uses Powertools' built-in dimensions (&lt;code&gt;function_name&lt;/code&gt; + &lt;code&gt;service&lt;/code&gt;) rather than our custom AZ dimension, because Powertools captures it before the handler runs. So the Cold Starts widget shows a total count rather than a per-AZ breakdown.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploy and Try It
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; AWS CLI, SAM CLI, Python 3.13+, an AWS account.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/RDarrylR/lambda-az-metadata.git
&lt;span class="nb"&gt;cd &lt;/span&gt;lambda-az-metadata

&lt;span class="c"&gt;# Build and deploy&lt;/span&gt;
sam build &lt;span class="nt"&gt;--profile&lt;/span&gt; your-profile &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
sam deploy &lt;span class="nt"&gt;--guided&lt;/span&gt; &lt;span class="nt"&gt;--profile&lt;/span&gt; your-profile &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;span class="c"&gt;# Or use the Makefile&lt;/span&gt;
make deploy &lt;span class="nv"&gt;PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-profile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After deploying, grab the API URL from the stack outputs and explore:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# See your AZ metadata&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nv"&gt;$API_URL&lt;/span&gt;/metadata | python3 &lt;span class="nt"&gt;-m&lt;/span&gt; json.tool

&lt;span class="c"&gt;# Test same-AZ routing&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nv"&gt;$API_URL&lt;/span&gt;/route &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"message": "hello"}'&lt;/span&gt; | python3 &lt;span class="nt"&gt;-m&lt;/span&gt; json.tool

&lt;span class="c"&gt;# Test chaos engineering (replace use1-az1 with an AZ from your /metadata response)&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nv"&gt;$API_URL&lt;/span&gt;/chaos &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"target_az": "use1-az1", "failure_mode": "error"}'&lt;/span&gt; | python3 &lt;span class="nt"&gt;-m&lt;/span&gt; json.tool

&lt;span class="c"&gt;# See the AZ distribution across 60 concurrent invocations&lt;/span&gt;
make test-distribution
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cleanup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sam delete &lt;span class="nt"&gt;--stack-name&lt;/span&gt; lambda-az-metadata &lt;span class="nt"&gt;--profile&lt;/span&gt; your-profile &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What It Costs
&lt;/h2&gt;

&lt;p&gt;This demo runs well within the AWS free tier. Lambda gives you 1M requests/month and 400K GB-seconds. API Gateway gives you 1M calls for the first 12 months. The metadata endpoint itself is free - no additional charges.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotchas
&lt;/h2&gt;

&lt;p&gt;A few things to watch for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Region-specific AZ IDs.&lt;/strong&gt; The demo code and dashboard use &lt;code&gt;us-east-1&lt;/code&gt; AZ IDs (&lt;code&gt;use1-az1&lt;/code&gt; through &lt;code&gt;use1-az6&lt;/code&gt;). If you deploy to a different region, update the default endpoint map in &lt;code&gt;app.py&lt;/code&gt; and the dashboard metrics in &lt;code&gt;template.yaml&lt;/code&gt; with your region's AZ IDs (e.g., &lt;code&gt;usw2-az1&lt;/code&gt; for &lt;code&gt;us-west-2&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caching is your friend.&lt;/strong&gt; The metadata response is &lt;code&gt;immutable&lt;/code&gt; within an execution environment. Resolve it once at module level and forget about it. There is no reason to call the endpoint on every invocation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC doesn't matter.&lt;/strong&gt; The metadata endpoint is localhost inside the execution environment - it works identically with or without a VPC. No VPC endpoints, NAT gateways, or security group rules needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SnapStart changes things.&lt;/strong&gt; If you use SnapStart, the execution environment might move to a different AZ after restore. The metadata endpoint cooperates with SnapStart by reducing the cache TTL during initialization, so Powertools can detect the need to re-fetch after restore and handles this automatically. But this demo resolves &lt;code&gt;AZ_ID&lt;/code&gt; at module level during init - that code runs during the snapshot phase. After restore, the cached module-level variable could be wrong. For SnapStart, move the AZ resolution into the handler or use a &lt;code&gt;@register_checkpoint_restore&lt;/code&gt; hook to re-resolve after restore.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The response may grow.&lt;/strong&gt; The documentation says additional fields may be added in future updates. Don't fail on unknown fields.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;log_event=True&lt;/code&gt; and &lt;code&gt;CAPTURE_RESPONSE&lt;/code&gt; in production.&lt;/strong&gt; I use both here for debugging visibility, but in production, disable them if your payloads contain sensitive data. &lt;code&gt;log_event=True&lt;/code&gt; logs full request bodies to CloudWatch, and &lt;code&gt;POWERTOOLS_TRACER_CAPTURE_RESPONSE&lt;/code&gt; stores full response bodies as X-Ray metadata. Control both via environment variables without code changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer version.&lt;/strong&gt; The &lt;code&gt;get_lambda_metadata()&lt;/code&gt; utility requires Powertools v3.26.0+ (Lambda Layer version 30+, released March 20, 2026). The SAM template defaults to version 30. Check the &lt;a href="https://docs.aws.amazon.com/powertools/python/latest/getting-started/install/" rel="noopener noreferrer"&gt;layer version history&lt;/a&gt; for the latest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Import path.&lt;/strong&gt; As of v3.26.0, the working import path is &lt;code&gt;from aws_lambda_powertools.utilities.metadata import get_lambda_metadata&lt;/code&gt;. The &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-metadata-endpoint.html" rel="noopener noreferrer"&gt;Lambda documentation&lt;/a&gt; currently shows &lt;code&gt;utilities.lambda_metadata&lt;/code&gt;, which may be corrected in a future docs update. Verify against your installed version if in doubt.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The metadata endpoint currently returns only &lt;code&gt;AvailabilityZoneID&lt;/code&gt;, but the versioned API path (&lt;code&gt;/2026-01-15/&lt;/code&gt;) and the documentation's note about future fields suggest more is coming. Execution environment ID? Host-level metadata? We'll see.&lt;/p&gt;

&lt;p&gt;For now, the AZ ID alone unlocks meaningful improvements in observability, cost optimization, and resilience testing. It's one of those features that's so obviously useful, you wonder why it took this long.&lt;/p&gt;

&lt;p&gt;The code is on &lt;a href="https://github.com/RDarrylR/lambda-az-metadata" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Clone it, deploy it, and check your CloudWatch dashboard to see which AZs your Lambda functions are actually running in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/03/lambda-availability-zone-metadata/" rel="noopener noreferrer"&gt;AWS Announcement: Lambda Availability Zone Metadata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-metadata-endpoint.html" rel="noopener noreferrer"&gt;Lambda Metadata Endpoint Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.powertools.aws.dev/lambda/python/latest/" rel="noopener noreferrer"&gt;Powertools for AWS Lambda (Python)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/powertools/python/latest/utilities/metadata/" rel="noopener noreferrer"&gt;Powertools Lambda Metadata Utility&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/powertools/python/latest/getting-started/install/" rel="noopener noreferrer"&gt;Powertools Layer Versions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/whitepapers/latest/advanced-multi-az-resilience-patterns/" rel="noopener noreferrer"&gt;Advanced Multi-AZ Resilience Patterns (Whitepaper)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/aws-lambda-resilience-under-the-hood/" rel="noopener noreferrer"&gt;AWS Lambda Resilience Under the Hood&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Connect with me on&lt;/em&gt; &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://bsky.app/profile/darrylruggles.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://github.com/RDarrylR" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;&lt;em&gt;, or the&lt;/em&gt; &lt;a href="https://community.aws/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;&lt;em&gt;. Check out more of my projects at&lt;/em&gt; &lt;a href="https://darryl-ruggles.cloud" rel="noopener noreferrer"&gt;darryl-ruggles.cloud&lt;/a&gt; &lt;em&gt;and join the&lt;/em&gt; &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless&lt;/a&gt; &lt;em&gt;community.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>lambda</category>
      <category>serverless</category>
    </item>
    <item>
      <title>A Complete Terraform Setup for EKS Auto Mode: Is it right for you?</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Tue, 24 Mar 2026 16:41:12 +0000</pubDate>
      <link>https://forem.com/aws-builders/a-complete-terraform-setup-for-eks-auto-mode-is-it-right-for-you-2mol</link>
      <guid>https://forem.com/aws-builders/a-complete-terraform-setup-for-eks-auto-mode-is-it-right-for-you-2mol</guid>
      <description>&lt;p&gt;One of the things I appreciate about AWS is the range of container options available. I recently wrote about why &lt;a href="https://darryl-ruggles.cloud/elastic-container-service-ecs-my-default-choice-for-containers-on-aws/" rel="noopener noreferrer"&gt;ECS is my default choice for containers on AWS&lt;/a&gt; - it's simple, powerful, and handles most workloads well. But when you need Kubernetes specifically - for portability, ecosystem tooling, or team expertise - EKS Auto Mode is an interesting approach worth exploring.&lt;/p&gt;

&lt;p&gt;Instead of managing node groups, installing Karpenter, configuring the VPC CNI plugin, deploying the AWS Load Balancer Controller, setting up the EBS CSI driver, and keeping all of those components updated and compatible with each other - you enable a single flag and AWS handles all of it. The data plane becomes a managed service. The limitations that Auto Mode has are not insignificant but depending on the use case it may make sense for many teams to consider using Auto Mode.&lt;/p&gt;

&lt;p&gt;I built a complete production-ready EKS Auto Mode cluster with Terraform to see how it works in practice. All the code is on GitHub: &lt;a href="https://github.com/RDarrylR/eks-auto-mode-terraform" rel="noopener noreferrer"&gt;github.com/RDarrylR/eks-auto-mode-terraform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This post walks through everything: what Auto Mode actually manages, how the Terraform setup works, deploying a real application, autoscaling from pods to nodes, observability with Container Insights, the security posture you get out of the box, custom NodePools for cost optimization, and the honest trade-offs you should know about before adopting it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why EKS Auto Mode?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8oswkczz0e3g6rwxbid.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8oswkczz0e3g6rwxbid.png" alt="EKS Auto Mode Architecture - showing how Auto Mode manages the data plane as a service" width="800" height="687"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you've ever managed an EKS cluster, you know the operational burden. The control plane is managed by AWS, but everything else is on you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Node management&lt;/strong&gt; - Choosing instance types, managing AMIs, patching OS, handling node drains and replacements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Networking&lt;/strong&gt; - Installing and configuring the VPC CNI plugin, managing pod IP allocation, setting up network policies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt; - Deploying the EBS CSI driver, managing StorageClasses, handling volume lifecycle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load balancing&lt;/strong&gt; - Installing the AWS Load Balancer Controller, configuring IngressClasses, managing target group bindings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling&lt;/strong&gt; - Deploying Karpenter or Cluster Autoscaler, configuring NodePools, tuning consolidation policies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add-on compatibility&lt;/strong&gt; - Making sure all these components work together after every Kubernetes version upgrade&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multiply this across multiple clusters and environments, and you are spending more time on infrastructure plumbing than on the applications running on top of it.&lt;/p&gt;

&lt;p&gt;EKS Auto Mode eliminates all of this. AWS takes over the entire data plane - compute, networking, storage, load balancing, and health monitoring. The only pods running in your cluster are the Kubernetes Metrics Server. Everything else runs off-cluster as managed services.&lt;/p&gt;




&lt;h2&gt;
  
  
  What AWS Manages For You
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffg7mngbppktaxjefv2xc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffg7mngbppktaxjefv2xc.png" alt="EKS Auto Mode Managed Components - compute, networking, storage, load balancing, identity, and health monitoring" width="800" height="301"&gt;&lt;/a&gt;&lt;br&gt;
When you enable Auto Mode, AWS takes responsibility for six major areas:&lt;/p&gt;
&lt;h3&gt;
  
  
  Compute
&lt;/h3&gt;

&lt;p&gt;Auto Mode uses Karpenter under the hood, running it as a managed service outside your cluster. You never see Karpenter pods, never upgrade them, never troubleshoot their OOM kills. Karpenter is a significant upgrade over the standard Cluster Autoscaler in my experience. With Auto Mode, AWS handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node provisioning, scaling, and termination&lt;/li&gt;
&lt;li&gt;AMI selection - custom Bottlerocket variants with SELinux enforcing and read-only root filesystem&lt;/li&gt;
&lt;li&gt;OS patching via a continuous deployment pipeline with CVE scanning and staged rollout&lt;/li&gt;
&lt;li&gt;14-day default node expiration (configurable up to 21 days) with automatic replacement&lt;/li&gt;
&lt;li&gt;Spot Instance interruption handling&lt;/li&gt;
&lt;li&gt;GPU support with drivers for NVIDIA, Neuron (Trainium/Inferentia), and EFA&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The nodes are &lt;strong&gt;EC2 managed instances&lt;/strong&gt; - a new EC2 primitive where operational control is delegated to AWS. They appear in your account, use your Reserved Instances and Savings Plans, but you cannot SSH into them or modify their configuration directly.&lt;/p&gt;
&lt;h3&gt;
  
  
  Networking
&lt;/h3&gt;

&lt;p&gt;The VPC CNI runs as a managed component:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pod networking with IPv4 and IPv6 support&lt;/li&gt;
&lt;li&gt;Network Policy enforcement&lt;/li&gt;
&lt;li&gt;Node-local DNS - CoreDNS runs as a system service directly on each node rather than as a traditional cluster deployment. This eliminates cross-node DNS latency and removes CoreDNS pods from your workload scheduling&lt;/li&gt;
&lt;li&gt;Pod Identity Agent (built-in, no add-on installation)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Storage
&lt;/h3&gt;

&lt;p&gt;The EBS CSI driver is managed with provisioner &lt;code&gt;ebs.csi.eks.amazonaws.com&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Persistent volume provisioning with configurable IOPS, throughput, and encryption&lt;/li&gt;
&lt;li&gt;Ephemeral storage configuration&lt;/li&gt;
&lt;li&gt;Automatic NVMe setup for supported instance types&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Load Balancing
&lt;/h3&gt;

&lt;p&gt;The AWS Load Balancer Controller runs as a managed service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ALB and NLB provisioning from Kubernetes Ingress/Service resources&lt;/li&gt;
&lt;li&gt;Uses IngressClass &lt;code&gt;alb&lt;/code&gt; (controller: &lt;code&gt;eks.amazonaws.com/alb&lt;/code&gt;) and loadBalancerClass &lt;code&gt;eks.amazonaws.com/nlb&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Supports Ingress API and Service annotations only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gateway API is not supported.&lt;/strong&gt; The managed ALB controller only supports the Ingress API and Service annotations. This could be a deal breaker for teams adopting the Gateway API.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Health Monitoring
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Automatic detection and repair of unhealthy nodes&lt;/li&gt;
&lt;li&gt;Replaces unresponsive kubelets and PID-exhausted nodes&lt;/li&gt;
&lt;li&gt;Reports through Kubernetes events and conditions&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  The Terraform Setup
&lt;/h2&gt;

&lt;p&gt;I used the &lt;code&gt;terraform-aws-modules/eks/aws&lt;/code&gt; community module (v21.x) with the AWS provider ~&amp;gt; 6.0. There is no separate official AWS Terraform module for EKS Auto Mode - the community module maintained by Anton Babenko is the de facto standard for EKS on Terraform, and it has first-class Auto Mode support. When you set &lt;code&gt;compute_config.enabled = true&lt;/code&gt;, the module automatically handles the storage and load balancing configuration, hardcodes the right bootstrap settings, creates least-privilege IAM roles, and attaches the five managed policies the cluster role needs. You could build this with raw &lt;code&gt;aws_eks_cluster&lt;/code&gt; resources, but the module eliminates several hundred lines of boilerplate and is referenced in AWS's own documentation.&lt;/p&gt;

&lt;p&gt;The complete setup creates a production-ready cluster with VPC, Auto Mode enabled, custom NodePools, Container Insights observability, and least-privilege IAM configuration. Here is what the full infrastructure looks like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhykk6dyzcztf4zo3x234.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhykk6dyzcztf4zo3x234.png" alt="EKS Auto Mode Complete Infrastructure - VPC, subnets, EKS cluster, IAM roles, K8s resources, and managed data plane components" width="800" height="1269"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Provider Configuration
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;required_version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;gt;= 1.10"&lt;/span&gt;

  &lt;span class="nx"&gt;required_providers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"hashicorp/aws"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 6.0"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;kubernetes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"hashicorp/kubernetes"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 2.38"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;kubectl&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"alekc/kubectl"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 2.1"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="s2"&gt;"aws"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_region&lt;/span&gt;

  &lt;span class="nx"&gt;default_tags&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;Project&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_name&lt;/span&gt;
      &lt;span class="nx"&gt;Environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt;
      &lt;span class="nx"&gt;ManagedBy&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;A note on the &lt;code&gt;kubectl&lt;/code&gt; provider: I use the &lt;code&gt;alekc/kubectl&lt;/code&gt; fork rather than &lt;code&gt;gavinbunney/kubectl&lt;/code&gt;. The &lt;code&gt;alekc&lt;/code&gt; fork is actively maintained and handles CRD resources (like Karpenter NodePools and NodeClasses) reliably.&lt;/p&gt;
&lt;h3&gt;
  
  
  VPC
&lt;/h3&gt;

&lt;p&gt;The VPC uses the community VPC module (v6.x) with both private and public subnets. The key detail is the subnet tags - EKS Auto Mode uses these to discover where to place nodes and load balancers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"vpc"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-aws-modules/vpc/aws"&lt;/span&gt;
  &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 6.0"&lt;/span&gt;

  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-vpc"&lt;/span&gt;
  &lt;span class="nx"&gt;cidr&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.0.0.0/16"&lt;/span&gt;

  &lt;span class="nx"&gt;azs&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_availability_zones&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;available&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;names&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;private_subnets&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"10.0.1.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.0.2.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.0.3.0/24"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;public_subnets&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"10.0.101.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.0.102.0/24"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"10.0.103.0/24"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;enable_nat_gateway&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;single_nat_gateway&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;enable_dns_hostnames&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="nx"&gt;public_subnet_tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"kubernetes.io/role/elb"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;private_subnet_tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"kubernetes.io/role/internal-elb"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth noting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single NAT gateway&lt;/strong&gt; for cost optimization in non-production. Use one per AZ for production HA.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No VPC endpoints&lt;/strong&gt; in this demo for simplicity. Production setups should add VPC endpoints for ECR (both &lt;code&gt;ecr.api&lt;/code&gt; and &lt;code&gt;ecr.dkr&lt;/code&gt;), S3, STS, and CloudWatch to reduce NAT gateway costs and keep traffic private. This is especially impactful at scale - NAT gateway data processing charges add up quickly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Public subnet tags&lt;/strong&gt; with &lt;code&gt;kubernetes.io/role/elb&lt;/code&gt; tell the managed load balancer controller where to create internet-facing ALBs/NLBs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private subnet tags&lt;/strong&gt; with &lt;code&gt;kubernetes.io/role/internal-elb&lt;/code&gt; are for internal load balancers.&lt;/li&gt;
&lt;li&gt;Nodes always run in private subnets. The load balancers sit in public subnets.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  EKS Cluster with Auto Mode
&lt;/h3&gt;

&lt;p&gt;This is where Auto Mode gets enabled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"eks"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform-aws-modules/eks/aws"&lt;/span&gt;
  &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 21.0"&lt;/span&gt;

  &lt;span class="nx"&gt;name&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_name&lt;/span&gt;
  &lt;span class="nx"&gt;kubernetes_version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"1.35"&lt;/span&gt;

  &lt;span class="nx"&gt;endpoint_public_access&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="c1"&gt;# UNCOMMENT for production: restrict to your IP ranges or use private-only access&lt;/span&gt;
  &lt;span class="c1"&gt;# cluster_endpoint_public_access_cidrs = ["YOUR_CIDR/32"]&lt;/span&gt;

  &lt;span class="c1"&gt;# Enable Auto Mode - this single block enables:&lt;/span&gt;
  &lt;span class="c1"&gt;#   - Managed Karpenter (compute)&lt;/span&gt;
  &lt;span class="c1"&gt;#   - Managed EBS CSI driver (storage_config derived from compute_config.enabled)&lt;/span&gt;
  &lt;span class="c1"&gt;#   - Managed ALB/NLB Controller (elastic_load_balancing derived from compute_config.enabled)&lt;/span&gt;
  &lt;span class="c1"&gt;#   - Managed VPC CNI, kube-proxy, CoreDNS (bootstrap_self_managed_addons hardcoded false)&lt;/span&gt;
  &lt;span class="nx"&gt;compute_config&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;enabled&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="nx"&gt;node_pools&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"general-purpose"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_ids&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnets&lt;/span&gt;

  &lt;span class="c1"&gt;# Auto Mode IAM resources - creates the node IAM role with required policies&lt;/span&gt;
  &lt;span class="nx"&gt;create_node_iam_role&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;create_auto_mode_iam_resources&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="c1"&gt;# Cluster access - grants the deploying identity cluster admin&lt;/span&gt;
  &lt;span class="nx"&gt;enable_cluster_creator_admin_permissions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="c1"&gt;# Addons - metrics-server for HPA + Container Insights for observability&lt;/span&gt;
  &lt;span class="nx"&gt;addons&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;metrics-server&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;most_recent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;amazon-cloudwatch-observability&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;most_recent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="nx"&gt;pod_identity_association&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="nx"&gt;role_arn&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
        &lt;span class="nx"&gt;service_account&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"cloudwatch-agent"&lt;/span&gt;
      &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production notes:&lt;/strong&gt; This demo uses a public API endpoint without CIDR restrictions and local Terraform state. For production, restrict &lt;code&gt;cluster_endpoint_public_access_cidrs&lt;/code&gt; to your corporate IP ranges (or use private-only access with a VPN/bastion), and configure a remote backend (S3 + DynamoDB) for state locking and team collaboration.&lt;/p&gt;

&lt;p&gt;The critical parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;compute_config.enabled = true&lt;/code&gt;&lt;/strong&gt; - This is the flag that enables Auto Mode. The module automatically derives &lt;code&gt;storage_config&lt;/code&gt; (block storage) and &lt;code&gt;kubernetes_network_config&lt;/code&gt; (elastic load balancing) from this single flag - you do not need to set them separately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;compute_config.node_pools&lt;/code&gt;&lt;/strong&gt; - The built-in pools to activate. &lt;code&gt;general-purpose&lt;/code&gt; handles your application workloads (On-Demand, C/M/R families, gen 4+, AMD and ARM). &lt;code&gt;system&lt;/code&gt; handles EKS internal add-ons with a &lt;code&gt;CriticalAddonsOnly&lt;/code&gt; taint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;bootstrap_self_managed_addons&lt;/code&gt;&lt;/strong&gt; - The module hardcodes this to &lt;code&gt;false&lt;/code&gt; when Auto Mode is enabled, so you do not need to set it. Auto Mode manages VPC CNI, kube-proxy, and CoreDNS off-cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;create_auto_mode_iam_resources = true&lt;/code&gt;&lt;/strong&gt; - Creates the node IAM role with &lt;code&gt;AmazonEKSWorkerNodeMinimalPolicy&lt;/code&gt; and &lt;code&gt;AmazonEC2ContainerRegistryPullOnly&lt;/code&gt; - the least-privilege policies AWS recommends for Auto Mode (not the broader &lt;code&gt;AmazonEKSWorkerNodePolicy&lt;/code&gt; or &lt;code&gt;AmazonEC2ContainerRegistryReadOnly&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;create_node_iam_role = true&lt;/code&gt;&lt;/strong&gt; - Must be &lt;code&gt;true&lt;/code&gt; (default) so the module creates the Auto Mode node role with &lt;code&gt;nodeRoleArn&lt;/code&gt;. Setting this to &lt;code&gt;false&lt;/code&gt; without providing a custom &lt;code&gt;node_role_arn&lt;/code&gt; causes the EKS API to reject the cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;addons&lt;/code&gt;&lt;/strong&gt; - Deploys the metrics-server (required for HPA to read CPU/memory metrics) and CloudWatch Container Insights with Pod Identity for observability (more on this below).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The module also automatically attaches five managed policies to the cluster IAM role: &lt;code&gt;AmazonEKSClusterPolicy&lt;/code&gt;, &lt;code&gt;AmazonEKSComputePolicy&lt;/code&gt;, &lt;code&gt;AmazonEKSBlockStoragePolicy&lt;/code&gt;, &lt;code&gt;AmazonEKSLoadBalancingPolicy&lt;/code&gt;, and &lt;code&gt;AmazonEKSNetworkingPolicy&lt;/code&gt;. Plus a custom tags policy for Auto Mode resource tagging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important Terraform caveat:&lt;/strong&gt; When disabling Auto Mode later, you must first apply with &lt;code&gt;compute_config.enabled = false&lt;/code&gt; before removing the block entirely. Removing it in one step causes issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  IAM Architecture
&lt;/h3&gt;

&lt;p&gt;The setup follows AWS recommended least-privilege principles with three distinct roles:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Policies&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cluster Role&lt;/td&gt;
&lt;td&gt;AmazonEKSClusterPolicy, AmazonEKSComputePolicy, AmazonEKSBlockStoragePolicy, AmazonEKSLoadBalancingPolicy, AmazonEKSNetworkingPolicy&lt;/td&gt;
&lt;td&gt;Cluster control plane + Auto Mode capabilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Node Role&lt;/td&gt;
&lt;td&gt;AmazonEKSWorkerNodeMinimalPolicy, AmazonEC2ContainerRegistryPullOnly&lt;/td&gt;
&lt;td&gt;Minimal node permissions (not the broader WorkerNodePolicy)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch Role&lt;/td&gt;
&lt;td&gt;CloudWatchAgentServerPolicy, AWSXrayWriteOnlyAccess&lt;/td&gt;
&lt;td&gt;Container Insights via Pod Identity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The node role uses &lt;code&gt;AmazonEKSWorkerNodeMinimalPolicy&lt;/code&gt; instead of the broader &lt;code&gt;AmazonEKSWorkerNodePolicy&lt;/code&gt;, and &lt;code&gt;AmazonEC2ContainerRegistryPullOnly&lt;/code&gt; instead of &lt;code&gt;AmazonEC2ContainerRegistryReadOnly&lt;/code&gt;. These are the least-privilege policies AWS specifically recommends for Auto Mode.&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom NodePool for Spot Instances
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8k70ls3v82hbf6bvgnon.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8k70ls3v82hbf6bvgnon.png" alt="EKS Auto Mode NodePools - built-in general-purpose pool and custom Spot pool configuration" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The built-in &lt;code&gt;general-purpose&lt;/code&gt; pool uses On-Demand instances only, with C, M, and R instance families (generation 4+, both AMD and ARM/Graviton). For cost optimization, you can create a custom NodePool that uses Spot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"kubectl_manifest"&lt;/span&gt; &lt;span class="s2"&gt;"spot_nodeclass"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;yaml_body&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;yamlencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;apiVersion&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"eks.amazonaws.com/v1"&lt;/span&gt;
    &lt;span class="nx"&gt;kind&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"NodeClass"&lt;/span&gt;
    &lt;span class="nx"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"spot-class"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;spec&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;eks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;node_iam_role_name&lt;/span&gt;
      &lt;span class="nx"&gt;subnetSelectorTerms&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s2"&gt;"kubernetes.io/role/internal-elb"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"1"&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;securityGroupSelectorTerms&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s2"&gt;"aws:eks:cluster-name"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_name&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="nx"&gt;depends_on&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;eks&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"kubectl_manifest"&lt;/span&gt; &lt;span class="s2"&gt;"spot_nodepool"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;yaml_body&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;yamlencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;apiVersion&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"karpenter.sh/v1"&lt;/span&gt;
    &lt;span class="nx"&gt;kind&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"NodePool"&lt;/span&gt;
    &lt;span class="nx"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"spot-compute"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;spec&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;template&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;labels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s2"&gt;"workload-type"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"spot-eligible"&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nx"&gt;spec&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;nodeClassRef&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;group&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"eks.amazonaws.com"&lt;/span&gt;
            &lt;span class="nx"&gt;kind&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"NodeClass"&lt;/span&gt;
            &lt;span class="nx"&gt;name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"spot-class"&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="nx"&gt;requirements&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="nx"&gt;key&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"karpenter.sh/capacity-type"&lt;/span&gt;
              &lt;span class="nx"&gt;operator&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"In"&lt;/span&gt;
              &lt;span class="nx"&gt;values&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"spot"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="nx"&gt;key&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"eks.amazonaws.com/instance-category"&lt;/span&gt;
              &lt;span class="nx"&gt;operator&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"In"&lt;/span&gt;
              &lt;span class="nx"&gt;values&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="nx"&gt;key&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"eks.amazonaws.com/instance-generation"&lt;/span&gt;
              &lt;span class="nx"&gt;operator&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Gte"&lt;/span&gt;
              &lt;span class="nx"&gt;values&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"5"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;]&lt;/span&gt;
          &lt;span class="nx"&gt;expireAfter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"336h"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;disruption&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;consolidationPolicy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"WhenEmptyOrUnderutilized"&lt;/span&gt;
        &lt;span class="nx"&gt;consolidateAfter&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"1m"&lt;/span&gt;
        &lt;span class="nx"&gt;budgets&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;nodes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"20%"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;weight&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="nx"&gt;depends_on&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;eks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;kubectl_manifest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;spot_nodeclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key differences from self-managed Karpenter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NodeClass uses &lt;code&gt;eks.amazonaws.com/v1&lt;/code&gt;&lt;/strong&gt; instead of &lt;code&gt;karpenter.k8s.aws/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NodeClass requires &lt;code&gt;role&lt;/code&gt;&lt;/strong&gt; - You must specify &lt;code&gt;role&lt;/code&gt; (IAM role name) or &lt;code&gt;instanceProfile&lt;/code&gt;. The EKS API rejects NodeClasses without one of these. Use &lt;code&gt;module.eks.node_iam_role_name&lt;/code&gt; from the Terraform module&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Labels use &lt;code&gt;eks.amazonaws.com/*&lt;/code&gt;&lt;/strong&gt; instead of &lt;code&gt;karpenter.k8s.aws/*&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NodePool API&lt;/strong&gt; is the same &lt;code&gt;karpenter.sh/v1&lt;/code&gt; in both cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;consolidateAfter&lt;/code&gt; is required&lt;/strong&gt; in the &lt;code&gt;disruption&lt;/code&gt; block - omitting it causes a validation error&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;weight&lt;/strong&gt; field (80 for Spot vs the default pool) determines scheduling preference&lt;/li&gt;
&lt;li&gt;Instance families (C/M/R) and generation (5+) are specified via &lt;code&gt;requirements&lt;/code&gt; - Karpenter selects the specific instance size based on pod resource requests at scheduling time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To direct workloads to Spot nodes, add a nodeSelector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;nodeSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;workload-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;spot-eligible&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Deploying a Real Application
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0z59g0d4ruu5kls8yqo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0z59g0d4ruu5kls8yqo.png" alt="Demo Application Architecture - FastAPI app with ALB and autoscaling" width="800" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To demonstrate the full workflow, I deployed a Python FastAPI application with an ALB, autoscaling, and a load generator for scaling demos.&lt;/p&gt;

&lt;h3&gt;
  
  
  Application Code
&lt;/h3&gt;

&lt;p&gt;A demo API with four endpoints designed to showcase Auto Mode's capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/health&lt;/code&gt; - Health check for probes and ALB health checks&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/info&lt;/code&gt; - Returns pod and node details including the instance type Karpenter selected, whether the node is Spot or On-Demand, which NodePool it belongs to, and the AZ. Uses the Kubernetes API with a read-only RBAC binding to query node labels&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/compute/{iterations}&lt;/code&gt; - CPU-intensive endpoint for triggering HPA scaling&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/stress/{mb}?seconds=30&lt;/code&gt; - Allocates memory to demonstrate memory pressure. Holds the allocation then auto-releases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most interesting part is the &lt;code&gt;/info&lt;/code&gt; endpoint, which queries the Kubernetes API to read node labels set by Karpenter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_node_info&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch node labels via Kubernetes API to show Auto Mode scheduling decisions.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;kubernetes&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_incluster_config&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;v1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CoreV1Api&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NODE_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NODE_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;instance_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node.kubernetes.io/instance-type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;capacity_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;karpenter.sh/capacity-type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nodepool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;karpenter.sh/nodepool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topology.kubernetes.io/zone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kubernetes.io/arch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compute_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eks.amazonaws.com/compute-type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The other endpoints: &lt;code&gt;/compute/{iterations}&lt;/code&gt; does CPU-intensive work for HPA scaling demos, and &lt;code&gt;/stress/{mb}&lt;/code&gt; allocates memory (auto-releases after a configurable hold period) for memory pressure demos. &lt;a href="https://github.com/RDarrylR/eks-auto-mode-terraform/blob/main/app/main.py" rel="noopener noreferrer"&gt;Full app code on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here is a real response from the running demo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pod"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"demo-api-847f8d6ff5-9gvsq"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"namespace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"demo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ip"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"10.0.3.184"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"service_account"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"demo-api"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"node"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"node_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"i-01e881b330651bbc6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"instance_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"c7g.large"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"capacity_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"spot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"nodepool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"spot-compute"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"zone"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"us-east-1c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"arch"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arm64"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"compute_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auto"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice Karpenter selected a &lt;strong&gt;c7g.large&lt;/strong&gt; - a Graviton (ARM) Spot instance. This is a key insight: Karpenter optimizes for cost, and Graviton Spot instances are often the cheapest available option. The built-in &lt;code&gt;general-purpose&lt;/code&gt; pool includes both AMD and ARM architectures, and the custom Spot pool does too. Hit the endpoint multiple times and you may see responses from different pods on different nodes, potentially in different AZs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Building the Container Image
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/RDarrylR/eks-auto-mode-terraform/blob/main/app/Dockerfile" rel="noopener noreferrer"&gt;Dockerfile&lt;/a&gt; is a standard Python 3.13-slim image running uvicorn.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important: Build for the right architecture.&lt;/strong&gt; Auto Mode's NodePools include ARM (Graviton) instances by default - Karpenter often selects them because they are the cheapest option. You need to target the right platform:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Build for ARM (Graviton nodes - most common with Auto Mode Spot)&lt;/span&gt;
docker buildx build &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/arm64 &lt;span class="nt"&gt;-t&lt;/span&gt; my-app:latest &lt;span class="nt"&gt;--load&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Or if your nodes are AMD64&lt;/span&gt;
docker buildx build &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/amd64 &lt;span class="nt"&gt;-t&lt;/span&gt; my-app:latest &lt;span class="nt"&gt;--load&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using &lt;code&gt;:latest&lt;/code&gt; tags with Auto Mode requires &lt;code&gt;imagePullPolicy: Always&lt;/code&gt; in the deployment to ensure nodes pull the newest image on every pod creation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kubernetes Manifests
&lt;/h3&gt;

&lt;p&gt;The key manifest design decisions (&lt;a href="https://github.com/RDarrylR/eks-auto-mode-terraform/tree/main/k8s" rel="noopener noreferrer"&gt;full manifests on GitHub&lt;/a&gt;):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RBAC for /info&lt;/strong&gt; - The pod needs a ServiceAccount with a ClusterRole to read node labels. This is a minimal read-only &lt;code&gt;get&lt;/code&gt; on nodes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterRole&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;node-reader&lt;/span&gt;
&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;apiGroups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nodes"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;verbs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Downward API env vars&lt;/strong&gt; - The deployment injects &lt;code&gt;NODE_NAME&lt;/code&gt;, &lt;code&gt;POD_NAME&lt;/code&gt;, &lt;code&gt;POD_NAMESPACE&lt;/code&gt;, &lt;code&gt;POD_IP&lt;/code&gt;, and &lt;code&gt;POD_SERVICE_ACCOUNT&lt;/code&gt; via the Kubernetes downward API. The &lt;code&gt;/info&lt;/code&gt; endpoint uses &lt;code&gt;NODE_NAME&lt;/code&gt; to query the Kubernetes API for that node's labels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NODE_NAME&lt;/span&gt;
    &lt;span class="na"&gt;valueFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;fieldRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;fieldPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;spec.nodeName&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Other important details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;imagePullPolicy: Always&lt;/code&gt;&lt;/strong&gt; is required when using &lt;code&gt;:latest&lt;/code&gt; tags, since nodes cache images&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IngressClass&lt;/strong&gt; is &lt;code&gt;alb&lt;/code&gt; (not &lt;code&gt;eks.amazonaws.com/alb&lt;/code&gt;, since &lt;code&gt;/&lt;/code&gt; is invalid in RFC 1123 subdomain names). You must explicitly create the &lt;code&gt;IngressClassParams&lt;/code&gt; and &lt;code&gt;IngressClass&lt;/code&gt; resources - Auto Mode does not auto-register them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource requests&lt;/strong&gt; are set on every container. This is critical with Auto Mode - Karpenter provisions nodes based on requested resources. If you don't set requests, Auto Mode has no signal for right-sizing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health probes&lt;/strong&gt; are defined for proper rolling updates and load balancer health checks&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Horizontal Pod Autoscaler
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;autoscaling/v2&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HorizontalPodAutoscaler&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo-api&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;scaleTargetRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo-api&lt;/span&gt;
  &lt;span class="na"&gt;minReplicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="na"&gt;maxReplicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;behavior&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;scaleDown&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;stabilizationWindowSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;120&lt;/span&gt;
  &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Resource&lt;/span&gt;
      &lt;span class="na"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cpu&lt;/span&gt;
        &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Utilization&lt;/span&gt;
          &lt;span class="na"&gt;averageUtilization&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Resource&lt;/span&gt;
      &lt;span class="na"&gt;resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;memory&lt;/span&gt;
        &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Utilization&lt;/span&gt;
          &lt;span class="na"&gt;averageUtilization&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;70&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The HPA scales on both CPU and memory - whichever metric exceeds its threshold first triggers a scale-up. The &lt;code&gt;/compute&lt;/code&gt; endpoint drives CPU scaling, while the &lt;code&gt;/stress&lt;/code&gt; endpoint drives memory scaling. Both trigger the same pod-to-node scaling chain through Auto Mode.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;stabilizationWindowSeconds: 120&lt;/code&gt; shortens the default 5-minute stabilization window for scale-down, allowing the HPA to reduce replicas faster after load drops - useful for demos and workloads where you want quicker consolidation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; The HPA requires the Kubernetes Metrics Server to read CPU/memory utilization. The metrics-server is deployed as an EKS addon in the Terraform configuration (see the EKS cluster configuration above). Without it, HPA reports &lt;code&gt;cpu: &amp;lt;unknown&amp;gt;&lt;/code&gt; and never scales.&lt;/p&gt;

&lt;h3&gt;
  
  
  PodDisruptionBudget
&lt;/h3&gt;

&lt;p&gt;Since Auto Mode consolidation is enabled by default, a PDB protects your application during node replacement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;policy/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PodDisruptionBudget&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo-api&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;minAvailable&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;demo-api&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures at least one pod is always available when Auto Mode consolidates or rotates nodes. Without a PDB, Karpenter could drain all pods from a node simultaneously during consolidation.&lt;/p&gt;

&lt;p&gt;When the HPA scales pods up and existing nodes can't fit them, Auto Mode's managed Karpenter automatically provisions new nodes. When pods scale down and nodes become underutilized, consolidation kicks in and nodes are terminated. You never interact with the scaling infrastructure directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Autoscaling in Action
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ga9cuvz61dji2h5qlva.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ga9cuvz61dji2h5qlva.png" alt="EKS Auto Mode Scaling - pod-level HPA and node-level Karpenter scaling working together" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One of the most compelling features of Auto Mode is how seamlessly pod-level and node-level scaling work together. The repo includes everything needed to demonstrate this end-to-end.&lt;/p&gt;

&lt;h3&gt;
  
  
  Manual Scaling Demo
&lt;/h3&gt;

&lt;p&gt;The quickest way to see Auto Mode's node provisioning is to scale the deployment beyond what existing nodes can handle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Scale to 15 replicas - watch Auto Mode provision new nodes&lt;/span&gt;
kubectl scale deployment demo-api &lt;span class="nt"&gt;-n&lt;/span&gt; demo &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;15

&lt;span class="c"&gt;# Watch the scaling in real time&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; demo &lt;span class="nt"&gt;-o&lt;/span&gt; wide &lt;span class="nt"&gt;-w&lt;/span&gt;    &lt;span class="c"&gt;# Pods scheduling across nodes&lt;/span&gt;
kubectl get nodes &lt;span class="nt"&gt;-w&lt;/span&gt;                    &lt;span class="c"&gt;# New nodes appearing&lt;/span&gt;
kubectl get events &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="nt"&gt;--sort-by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'.lastTimestamp'&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'Nominated|Launched'&lt;/span&gt;

&lt;span class="c"&gt;# Scale back down - watch consolidation&lt;/span&gt;
kubectl scale deployment demo-api &lt;span class="nt"&gt;-n&lt;/span&gt; demo &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you scale to 15 replicas, each requesting 256m CPU and 512Mi memory, the existing nodes cannot fit all the pods. Auto Mode's managed Karpenter evaluates the pending pods, selects optimal instance types from the C/M/R families (generation 4+, including Graviton), and provisions new nodes. The instance type is determined by the aggregate resource requests and cost optimization - Karpenter might select a c7g.large (Graviton) over an m7i.large if it is cheaper and fits the workload. You can verify this by hitting the &lt;code&gt;/info&lt;/code&gt; endpoint to see exactly what Karpenter chose.&lt;/p&gt;

&lt;p&gt;When you scale back down, pods are removed and nodes become underutilized. Auto Mode's consolidation policy (&lt;code&gt;WhenEmptyOrUnderutilized&lt;/code&gt;) kicks in, draining and terminating nodes that are no longer needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Load-Driven Scaling Demo
&lt;/h3&gt;

&lt;p&gt;For a more realistic demonstration, the repo includes a &lt;a href="https://github.com/RDarrylR/eks-auto-mode-terraform/blob/main/k8s/load-generator.yaml" rel="noopener noreferrer"&gt;load generator&lt;/a&gt; - a busybox pod that continuously hits &lt;code&gt;/compute/500000&lt;/code&gt; to drive CPU utilization above the HPA threshold.&lt;/p&gt;

&lt;p&gt;The full scaling chain works like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Load generator drives traffic to the API&lt;/li&gt;
&lt;li&gt;CPU utilization exceeds the HPA's 60% target&lt;/li&gt;
&lt;li&gt;HPA increases the replica count&lt;/li&gt;
&lt;li&gt;New pods enter &lt;code&gt;Pending&lt;/code&gt; state (no room on existing nodes)&lt;/li&gt;
&lt;li&gt;Auto Mode's Karpenter detects pending pods and provisions new nodes&lt;/li&gt;
&lt;li&gt;Pods schedule onto the new nodes and begin serving traffic
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start the load generator&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; k8s/load-generator.yaml

&lt;span class="c"&gt;# Watch the full chain: HPA -&amp;gt; pods -&amp;gt; nodes&lt;/span&gt;
kubectl get hpa &lt;span class="nt"&gt;-n&lt;/span&gt; demo &lt;span class="nt"&gt;-w&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; demo &lt;span class="nt"&gt;-o&lt;/span&gt; wide &lt;span class="nt"&gt;-w&lt;/span&gt;
kubectl get nodes &lt;span class="nt"&gt;-w&lt;/span&gt;

&lt;span class="c"&gt;# Stop the load&lt;/span&gt;
kubectl delete &lt;span class="nt"&gt;-f&lt;/span&gt; k8s/load-generator.yaml
&lt;span class="c"&gt;# HPA scales pods down, Auto Mode consolidates nodes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Makefile includes convenience commands for all of this: &lt;code&gt;make demo-scale-up&lt;/code&gt;, &lt;code&gt;make demo-scale-down&lt;/code&gt;, &lt;code&gt;make demo-load-start&lt;/code&gt;, &lt;code&gt;make demo-load-stop&lt;/code&gt;, and &lt;code&gt;make demo-watch&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability with Container Insights
&lt;/h2&gt;

&lt;p&gt;Since February 2026, Auto Mode supports Managed Capability Logging - you can deliver logs from compute autoscaling (Karpenter), block storage, load balancing, and pod networking to CloudWatch Logs, S3, or Kinesis Data Firehose using CloudWatch Vended Logs. This closed what was previously the biggest observability gap.&lt;/p&gt;

&lt;p&gt;CloudWatch Container Insights complements this by providing node, pod, and container-level metrics with pre-built dashboards.&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraform Setup
&lt;/h3&gt;

&lt;p&gt;The CloudWatch observability addon is deployed as an EKS addon with Pod Identity - the recommended approach for Auto Mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In the EKS module&lt;/span&gt;
&lt;span class="nx"&gt;addons&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;metrics-server&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;most_recent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;amazon-cloudwatch-observability&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;most_recent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="nx"&gt;pod_identity_association&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;role_arn&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
      &lt;span class="nx"&gt;service_account&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"cloudwatch-agent"&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;metrics-server&lt;/code&gt; addon is required for HPA to read CPU and memory utilization. Without it, HPA cannot function.&lt;/p&gt;

&lt;p&gt;The IAM role uses Pod Identity (not IRSA) with least-privilege policies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_policy_document"&lt;/span&gt; &lt;span class="s2"&gt;"cloudwatch_assume"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;statement&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;actions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"sts:TagSession"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;principals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Service"&lt;/span&gt;
      &lt;span class="nx"&gt;identifiers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"pods.eks.amazonaws.com"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"cloudwatch"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-cloudwatch-observability"&lt;/span&gt;
  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_iam_policy_document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloudwatch_assume&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy_attachment"&lt;/span&gt; &lt;span class="s2"&gt;"cloudwatch_agent"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;policy_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy_attachment"&lt;/span&gt; &lt;span class="s2"&gt;"xray_write"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;policy_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pod Identity over IRSA&lt;/strong&gt; - Pod Identity is the recommended approach for EKS addons. It is built into Auto Mode (the Pod Identity Agent runs on every node automatically).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust principal is &lt;code&gt;pods.eks.amazonaws.com&lt;/code&gt;&lt;/strong&gt; - This is the Pod Identity service, not &lt;code&gt;ec2.amazonaws.com&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;AWSXrayWriteOnlyAccess&lt;/code&gt;&lt;/strong&gt; - Required alongside &lt;code&gt;CloudWatchAgentServerPolicy&lt;/code&gt; for full observability including trace collection.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What You Get
&lt;/h3&gt;

&lt;p&gt;Once deployed, Container Insights provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Node metrics&lt;/strong&gt; - CPU, memory, network, and disk utilization per node&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pod metrics&lt;/strong&gt; - CPU and memory usage per pod, with requests vs actual&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container-level performance&lt;/strong&gt; - Individual container metrics within pods&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cluster dashboards&lt;/strong&gt; - Pre-built CloudWatch dashboards for cluster health&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log collection&lt;/strong&gt; - Application and system logs forwarded to CloudWatch Logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This complements Managed Capability Logging by providing resource-level metrics that controller logs don't surface. Container Insights gives you the visibility to understand scaling behavior, identify resource bottlenecks, and verify that consolidation is working as expected.&lt;/p&gt;




&lt;h2&gt;
  
  
  Security Out of the Box
&lt;/h2&gt;

&lt;p&gt;Auto Mode provides a strong security posture with minimal configuration. Here is what you get by default:&lt;/p&gt;

&lt;h3&gt;
  
  
  Node Security
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bottlerocket OS&lt;/strong&gt; - Minimal, purpose-built container OS with SELinux enforcing and a read-only root filesystem. No package manager, no shell access.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IMDSv2 enforced&lt;/strong&gt; - Instance metadata hop limit hardcoded to 1, preventing containers from accessing node credentials via IMDS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No SSH/SSM access&lt;/strong&gt; - Nodes are locked down. You cannot open a shell on an Auto Mode node.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;14-day node rotation&lt;/strong&gt; - Nodes expire after 14 days by default (configurable up to 21 days) and are automatically replaced with the latest patched Bottlerocket AMI, ensuring CVE fixes are applied continuously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drift detection&lt;/strong&gt; - When AWS releases a new AMI (approximately weekly), Auto Mode detects the drift and replaces nodes with the updated image.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data Security
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Encrypted EBS volumes&lt;/strong&gt; - Auto Mode's managed EBS CSI driver supports encrypted volumes via StorageClass parameters (&lt;code&gt;encrypted: "true"&lt;/code&gt;), ensuring persistent volumes are encrypted at rest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;allowedTopologies&lt;/code&gt;&lt;/strong&gt; - StorageClasses can restrict volume provisioning to Auto Mode compute nodes via &lt;code&gt;eks.amazonaws.com/compute-type: auto&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secrets encryption&lt;/strong&gt; - EKS supports envelope encryption for Kubernetes secrets using KMS. Add &lt;code&gt;encryption_config&lt;/code&gt; to the EKS module to encrypt etcd secrets at rest with your own KMS key. This is not enabled by default but is recommended for production clusters handling sensitive data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  IAM Security
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Least-privilege node role&lt;/strong&gt; - Uses &lt;code&gt;AmazonEKSWorkerNodeMinimalPolicy&lt;/code&gt; (not the broader &lt;code&gt;AmazonEKSWorkerNodePolicy&lt;/code&gt;) and &lt;code&gt;AmazonEC2ContainerRegistryPullOnly&lt;/code&gt; (not &lt;code&gt;AmazonEC2ContainerRegistryReadOnly&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pod Identity&lt;/strong&gt; - Built-in, used for the CloudWatch addon. No need for IRSA or node-level IAM policies for application workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Separate IAM roles&lt;/strong&gt; - Cluster role, node role, and addon roles are distinct with non-overlapping permissions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Network Security
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Private subnets&lt;/strong&gt; - Nodes run exclusively in private subnets with no direct internet access.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed VPC CNI&lt;/strong&gt; - Network policies are enforced at the CNI level without additional tooling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NAT gateway&lt;/strong&gt; - Outbound traffic routes through NAT, providing a single egress point.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Resource Requests Matter More Than Ever
&lt;/h2&gt;

&lt;p&gt;This deserves its own section because it is the single most important operational consideration with Auto Mode.&lt;/p&gt;

&lt;p&gt;In a traditional EKS setup with Managed Node Groups, you might have fixed-size nodes (e.g., m5.xlarge) and the Kubernetes scheduler bins pods onto available capacity. Wasted capacity is absorbed by the fixed node pool.&lt;/p&gt;

&lt;p&gt;With Auto Mode (and Karpenter in general), nodes are provisioned to fit the requested resources. If your pod requests 256m CPU and 512Mi memory, Karpenter might provision a small instance just for that pod. If your pod actually needs 2 vCPU but only requests 256m, it will get throttled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best practices for resource requests:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Always set both CPU and memory requests&lt;/strong&gt; on every container&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use tools like Goldilocks or VPA in recommendation mode&lt;/strong&gt; to determine actual usage before setting requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set requests close to actual usage&lt;/strong&gt; - not the theoretical maximum&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set limits higher than requests&lt;/strong&gt; for burst capacity, but not orders of magnitude higher&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor actual vs requested&lt;/strong&gt; using Container Insights or Prometheus&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Karpenter Connection
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafza9y998b5a8w73hgav.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafza9y998b5a8w73hgav.png" alt="EKS Auto Mode Karpenter - how managed Karpenter powers Auto Mode node provisioning" width="800" height="1137"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Understanding that Auto Mode is Karpenter helps explain its behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Consolidation&lt;/strong&gt; is enabled by default. If a node is empty or underutilized, Auto Mode replaces it with a better-fitting instance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node expiration&lt;/strong&gt; is 14 days by default (max 21 days). Nodes are automatically replaced, picking up the latest Bottlerocket AMI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drift detection&lt;/strong&gt; triggers replacement when AWS releases a new AMI (approximately weekly)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disruption budget&lt;/strong&gt; of 10% prevents too many nodes from being replaced simultaneously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instance selection&lt;/strong&gt; is automatic - Karpenter evaluates the C/M/R families (generation 4+ for built-in, 5+ for our Spot pool) and picks the optimal size based on pending pod requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since February 2026, you can access Karpenter's logs through Managed Capability Logging. Configure CloudWatch Vended Logs to deliver compute autoscaling logs to CloudWatch Logs, S3, or Kinesis Data Firehose. For quick troubleshooting, Kubernetes events (from &lt;code&gt;eks-auto-mode/compute&lt;/code&gt;) remain the fastest signal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# View Auto Mode compute events&lt;/span&gt;
kubectl get events &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="nt"&gt;--sort-by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'.lastTimestamp'&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-20&lt;/span&gt;

&lt;span class="c"&gt;# Check node provisioning&lt;/span&gt;
kubectl get nodes &lt;span class="nt"&gt;-o&lt;/span&gt; wide &lt;span class="nt"&gt;--show-labels&lt;/span&gt;

&lt;span class="c"&gt;# See NodePool status&lt;/span&gt;
kubectl get nodepools
kubectl describe nodepool general-purpose
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Cost Analysis
&lt;/h2&gt;

&lt;p&gt;EKS Auto Mode has three cost components:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;EKS cluster fee&lt;/td&gt;
&lt;td&gt;$0.10/hour ($72/month)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EC2 instances&lt;/td&gt;
&lt;td&gt;Standard pricing (On-Demand, RI, Savings Plans, Spot)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto Mode management fee&lt;/td&gt;
&lt;td&gt;~12% on top of EC2 instance costs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Example Monthly Costs
&lt;/h3&gt;

&lt;p&gt;For a cluster running 3x m6i.xlarge (4 vCPU, 16 GiB each) on On-Demand:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;EKS cluster&lt;/td&gt;
&lt;td&gt;$72&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3x m6i.xlarge On-Demand&lt;/td&gt;
&lt;td&gt;~$432&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto Mode fee (~12%)&lt;/td&gt;
&lt;td&gt;~$52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NAT Gateway (single)&lt;/td&gt;
&lt;td&gt;~$32 + data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ALB (light traffic)&lt;/td&gt;
&lt;td&gt;~$22&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$610/month&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Compare to self-managed EKS with Karpenter:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;EKS cluster&lt;/td&gt;
&lt;td&gt;$72&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3x m6i.xlarge On-Demand&lt;/td&gt;
&lt;td&gt;~$432&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Karpenter pod overhead&lt;/td&gt;
&lt;td&gt;~$0 (runs on existing nodes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NAT Gateway&lt;/td&gt;
&lt;td&gt;~$32 + data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ALB&lt;/td&gt;
&lt;td&gt;~$22&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$558/month&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The ~$52/month premium (9-12% of compute) buys you elimination of operational overhead for Karpenter, VPC CNI, EBS CSI driver, ALB controller, Pod Identity Agent, AMI lifecycle, OS patching, and node health monitoring. For most teams, that operational burden costs far more than $52/month in engineer time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Optimization Strategies
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Custom Spot NodePools&lt;/strong&gt; - Create a Spot pool (shown above) for fault-tolerant workloads. Spot savings of 60-70% significantly offset the management fee&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Right-size resource requests&lt;/strong&gt; - Auto Mode provisions exactly what you request. Accurate requests mean less wasted capacity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consolidation&lt;/strong&gt; - Enabled by default. Auto Mode automatically bin-packs and replaces underutilized nodes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute Savings Plans&lt;/strong&gt; - Apply to Auto Mode EC2 instances just like regular instances&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Recent Feature Additions
&lt;/h2&gt;

&lt;p&gt;Auto Mode has been gaining features steadily since GA. Notable additions through early 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Managed Capability Logging&lt;/strong&gt; (February 2026) - Deliver logs from compute autoscaling, block storage, load balancing, and pod networking to CloudWatch Logs, S3, or Kinesis Data Firehose via CloudWatch Vended Logs. This closed the biggest observability gap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secondary IP mode&lt;/strong&gt; (December 2025) - IP conservation for large clusters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SOCI parallel pull&lt;/strong&gt; (November 2025) - Up to 60% faster container image pulls for GPU instances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static-capacity NodePools&lt;/strong&gt; (November 2025) - Fixed replica count for predictable workloads that need a guaranteed number of nodes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GovCloud availability for Auto Mode&lt;/strong&gt; (October 2025) - US-East and US-West GovCloud regions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FIPS-compatible AMIs&lt;/strong&gt; (October 2025) - For US compliance requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local Zones support&lt;/strong&gt; (October 2025) - Deploy Auto Mode nodes to edge locations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;associatePublicIPAddress&lt;/strong&gt; (August 2025) - Control public IP assignment in NodeClass&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ODCR/Capacity Reservation support&lt;/strong&gt; (June 2025) - Use On-Demand Capacity Reservations with Auto Mode&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pod subnet and security group selectors&lt;/strong&gt; (June 2025) - &lt;code&gt;podSubnetSelectorTerms&lt;/code&gt; and &lt;code&gt;podSecurityGroupSelectorTerms&lt;/code&gt; in NodeClass for network isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Static-capacity NodePools and ODCR support are particularly relevant for enterprise readers who need guaranteed capacity for critical workloads.&lt;/p&gt;




&lt;h2&gt;
  
  
  Current Limitations
&lt;/h2&gt;

&lt;p&gt;Auto Mode is not the right choice for every workload. Here are the constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No custom AMIs&lt;/strong&gt; - Only AWS-provided Bottlerocket variants. If you need Amazon Linux, Ubuntu, or custom-built images with compliance tooling baked in, Auto Mode won't work. Workaround: deploy host-level tooling as DaemonSets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No SSH/SSM access to nodes&lt;/strong&gt; - Nodes are locked down. You can get node logs through Kubernetes APIs and EKS troubleshooting tools, but cannot open a shell on a node&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Controller logs require extra setup&lt;/strong&gt; - Karpenter, ALB Controller, and other managed components run off-cluster. Since February 2026, you can access their logs via Managed Capability Logging (CloudWatch Vended Logs), but this requires explicit configuration - it is not enabled by default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No self-managed Karpenter&lt;/strong&gt; - Cannot coexist. Must uninstall before enabling Auto Mode&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No third-party CNI&lt;/strong&gt; - Only the managed VPC CNI. Calico and Cilium (as CNI) are not supported. Cilium for network policy only may work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linux only&lt;/strong&gt; - No Windows node support. This matters for .NET shops running Windows containers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IMDSv2 hop limit of 1&lt;/strong&gt; - Hardcoded, cannot be changed. This breaks some tools that need IMDS access from within pods (Datadog agent, some AWS SDK configurations). Workaround: pods that require IMDS access must run with &lt;code&gt;hostNetwork: true&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No DRA&lt;/strong&gt; - Dynamic Resource Allocation is not yet supported&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No Gateway API&lt;/strong&gt; - The managed ALB controller only supports the Ingress API and Service annotations (for NLB). If you need Gateway API (HTTPRoute, GatewayClass), you would need to disable Auto Mode's load balancing capability and self-manage the AWS Load Balancer Controller, which defeats much of the purpose of Auto Mode&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most common blocker I see discussed is the lack of custom AMIs and SSH access. If your security team requires specific hardening beyond what Bottlerocket provides, or your compliance framework mandates SSH-based auditing, Auto Mode is not ready for you yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Migration Considerations
&lt;/h2&gt;

&lt;p&gt;If you have an existing EKS cluster, migration to Auto Mode is possible but has significant caveats.&lt;/p&gt;

&lt;h3&gt;
  
  
  Storage Migration
&lt;/h3&gt;

&lt;p&gt;PVCs created with the self-managed EBS CSI driver (&lt;code&gt;ebs.csi.aws.com&lt;/code&gt;) cannot be directly mounted by Auto Mode's driver (&lt;code&gt;ebs.csi.eks.amazonaws.com&lt;/code&gt;). You need to recreate PV/PVC resources. AWS provides the &lt;code&gt;eks-auto-mode-ebs-migration-tool&lt;/code&gt; to help.&lt;/p&gt;

&lt;h3&gt;
  
  
  Load Balancer Migration
&lt;/h3&gt;

&lt;p&gt;Existing load balancers from the self-managed AWS Load Balancer Controller cannot be transferred to the managed controller. You need a blue-green approach - create new LBs under the managed controller and shift traffic via DNS.&lt;/p&gt;

&lt;h3&gt;
  
  
  From Self-Managed Karpenter
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Uninstall self-managed Karpenter&lt;/li&gt;
&lt;li&gt;Uninstall AWS Load Balancer Controller&lt;/li&gt;
&lt;li&gt;Enable Auto Mode&lt;/li&gt;
&lt;li&gt;Recreate NodePool/NodeClass resources with Auto Mode API versions&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Real-World Advice
&lt;/h3&gt;

&lt;p&gt;The Playtomic engineering team documented their migration experience and concluded: "If you're already running an EKS cluster with Karpenter and AWS Load Balancer Controller, we don't recommend migrating to EKS Auto Mode." Their recommendation is to adopt Auto Mode for new clusters.&lt;/p&gt;

&lt;p&gt;I agree with this assessment. The migration path has enough friction that the ROI for existing, well-running clusters is questionable. For greenfield deployments, Auto Mode is the obvious choice (if the limitations don't block you and you want an easier approach to running full Kubernetes on AWS)&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use Auto Mode vs Standard EKS vs Fargate
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqcwsdhrxnhd3zny36ke.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqcwsdhrxnhd3zny36ke.png" alt="EKS Auto Mode Comparison - decision framework for Auto Mode vs Standard EKS vs Fargate" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criteria&lt;/th&gt;
&lt;th&gt;EKS Auto Mode&lt;/th&gt;
&lt;th&gt;Standard EKS&lt;/th&gt;
&lt;th&gt;EKS on Fargate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Node management&lt;/td&gt;
&lt;td&gt;AWS managed&lt;/td&gt;
&lt;td&gt;You manage&lt;/td&gt;
&lt;td&gt;Serverless (no visible nodes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom AMIs&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSH to nodes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DaemonSets&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persistent storage&lt;/td&gt;
&lt;td&gt;Yes (managed EBS)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;EFS only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU support&lt;/td&gt;
&lt;td&gt;Yes (managed drivers)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Instance type control&lt;/td&gt;
&lt;td&gt;Via NodePool constraints&lt;/td&gt;
&lt;td&gt;Full control&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Third-party CNI&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost premium&lt;/td&gt;
&lt;td&gt;~12% on compute&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Fargate pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operational burden&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Lowest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Choose Auto Mode&lt;/strong&gt; when you want Kubernetes with minimal operational overhead, your workloads run on Linux, you don't need custom AMIs or SSH access, and you value AWS managing the full data plane.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Standard EKS&lt;/strong&gt; when you need custom AMIs, third-party CNIs, want to use Gateway API, need SSH access, or your team already has mature Karpenter automation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Fargate&lt;/strong&gt; when you have lightweight stateless workloads, need true serverless (no node concept), or want the absolute simplest deployment model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;After building and testing this setup, here are the practices I recommend:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Always set resource requests on every container.&lt;/strong&gt; Auto Mode provisions based on requests. Missing requests means unpredictable bin-packing and potentially oversized nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use custom Spot NodePools for non-critical workloads.&lt;/strong&gt; The built-in &lt;code&gt;general-purpose&lt;/code&gt; pool is On-Demand only. Creating a Spot pool can cut compute costs by 60-70% for batch jobs, dev environments, and other interruptible work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tag your subnets correctly.&lt;/strong&gt; Auto Mode discovers subnets via &lt;code&gt;kubernetes.io/role/elb&lt;/code&gt; and &lt;code&gt;kubernetes.io/role/internal-elb&lt;/code&gt; tags. Missing tags means load balancers and nodes won't land where you expect.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use the right resource class names.&lt;/strong&gt; StorageClass provisioner is &lt;code&gt;ebs.csi.eks.amazonaws.com&lt;/code&gt;, IngressClass is &lt;code&gt;alb&lt;/code&gt; (not &lt;code&gt;eks.amazonaws.com/alb&lt;/code&gt;), NLB loadBalancerClass is &lt;code&gt;eks.amazonaws.com/nlb&lt;/code&gt;. You must create &lt;code&gt;IngressClassParams&lt;/code&gt; and &lt;code&gt;IngressClass&lt;/code&gt; resources explicitly - Auto Mode does not auto-register them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure PodDisruptionBudgets.&lt;/strong&gt; Auto Mode consolidates underutilized nodes by default. PDBs ensure your application maintains availability during node replacement and consolidation. The demo includes a PDB with &lt;code&gt;minAvailable: 1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable observability.&lt;/strong&gt; Configure Managed Capability Logging for controller logs (compute, storage, LB, networking) and deploy Container Insights with Pod Identity for node and pod metrics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Start with Auto Mode for new clusters.&lt;/strong&gt; Don't migrate existing clusters unless you have a compelling reason. The storage and load balancer migration paths add significant risk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test the IMDSv2 hop limit.&lt;/strong&gt; If your pods use the AWS SDK, IMDS, or tools like the Datadog agent that rely on instance metadata, verify they work with the hardcoded hop limit of 1. Pods that need IMDS access must use &lt;code&gt;hostNetwork: true&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use least-privilege IAM.&lt;/strong&gt; Stick with &lt;code&gt;AmazonEKSWorkerNodeMinimalPolicy&lt;/code&gt; and &lt;code&gt;AmazonEC2ContainerRegistryPullOnly&lt;/code&gt; for the node role. Use Pod Identity for application IAM needs.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Complete Terraform
&lt;/h2&gt;

&lt;p&gt;The full Terraform code for this article is available on GitHub: &lt;a href="https://github.com/RDarrylR/eks-auto-mode-terraform" rel="noopener noreferrer"&gt;github.com/RDarrylR/eks-auto-mode-terraform&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The repository includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VPC with public and private subnets across 3 AZs&lt;/li&gt;
&lt;li&gt;EKS cluster with Auto Mode enabled (Kubernetes 1.35)&lt;/li&gt;
&lt;li&gt;Custom Spot NodePool and NodeClass (Karpenter CRDs)&lt;/li&gt;
&lt;li&gt;Least-privilege IAM roles (cluster, node, CloudWatch)&lt;/li&gt;
&lt;li&gt;CloudWatch Container Insights with Pod Identity&lt;/li&gt;
&lt;li&gt;ECR repository for the demo application&lt;/li&gt;
&lt;li&gt;Demo FastAPI app with &lt;code&gt;/info&lt;/code&gt; (node scheduling details), &lt;code&gt;/compute&lt;/code&gt; (CPU stress), and &lt;code&gt;/stress&lt;/code&gt; (memory pressure) endpoints&lt;/li&gt;
&lt;li&gt;Kubernetes manifests with ALB, HPA, and IngressClass configuration&lt;/li&gt;
&lt;li&gt;Load generator for autoscaling demos&lt;/li&gt;
&lt;li&gt;Makefile with deployment, scaling demo, and inspection commands
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Deploy the infrastructure&lt;/span&gt;
make init
make plan
make apply

&lt;span class="c"&gt;# Build and push the demo app&lt;/span&gt;
make docker-build
make docker-push

&lt;span class="c"&gt;# Deploy the application&lt;/span&gt;
make deploy-app

&lt;span class="c"&gt;# Run the scaling demo&lt;/span&gt;
make demo-scale-up    &lt;span class="c"&gt;# Scale to 15 replicas, watch node provisioning&lt;/span&gt;
make demo-watch       &lt;span class="c"&gt;# Watch HPA, pods, nodes, and events&lt;/span&gt;
make demo-scale-down  &lt;span class="c"&gt;# Scale back, watch consolidation&lt;/span&gt;

&lt;span class="c"&gt;# Or trigger HPA-driven scaling&lt;/span&gt;
make demo-load-start  &lt;span class="c"&gt;# Deploy load generator&lt;/span&gt;
make demo-watch       &lt;span class="c"&gt;# Watch the full scaling chain&lt;/span&gt;
make demo-load-stop   &lt;span class="c"&gt;# Clean up&lt;/span&gt;

&lt;span class="c"&gt;# Clean up&lt;/span&gt;
make delete-app
make destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  CLEANUP (IMPORTANT!!)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;If you deploy this infrastructure, it will cost you real money (~$610/month for the default configuration). Please do not forget about it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make sure to delete all resources when you are done:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make delete-app   &lt;span class="c"&gt;# Remove K8s resources first (avoids orphaned LBs/ENIs)&lt;/span&gt;
make destroy      &lt;span class="c"&gt;# Destroy all Terraform resources&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The EKS cluster, EC2 instances, NAT gateway, and ALB all incur hourly charges. Even if you are not running any application workloads, the cluster and VPC infrastructure will continue to cost you money until it is destroyed.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;EKS Auto Mode represents a significant shift in how AWS approaches Kubernetes. Instead of handing you a control plane and saying "good luck with the rest," AWS now manages the entire stack - compute, networking, storage, and add-ons - while you focus on your applications. It's similar in philosophy to Fargate, but with full EC2 visibility and Kubernetes compatibility.&lt;/p&gt;

&lt;p&gt;The ~12% cost premium is real, but the operational simplification is substantial. No more Karpenter upgrades, no more VPC CNI compatibility issues, no more EBS CSI driver version mismatches, no more ALB Controller deployments. For teams that want Kubernetes without the infrastructure burden, Auto Mode delivers.&lt;/p&gt;

&lt;p&gt;The Terraform setup with the community module is straightforward - enable a flag, configure your NodePools, and deploy. The biggest adjustment is remembering the Auto Mode resource class names (&lt;code&gt;eks.amazonaws.com/*&lt;/code&gt;) and accepting that you are trading visibility and control for operational simplicity.&lt;/p&gt;

&lt;p&gt;The security posture is strong out of the box - Bottlerocket with SELinux, IMDSv2 enforced, encrypted storage, least-privilege IAM, and automatic node rotation. Observability is now comprehensive - Managed Capability Logging delivers controller logs to CloudWatch, and Container Insights provides pod and node metrics. And the autoscaling story is seamless - HPA drives pod scaling, Karpenter drives node scaling, consolidation cleans up the rest.&lt;/p&gt;

&lt;p&gt;If you are starting a new Kubernetes project on AWS in 2026, Auto Mode should be considered for your default choice unless you have a problem with one of the things in the limitations list or have a specific reason to manage the data plane yourself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/automode.html" rel="noopener noreferrer"&gt;EKS Auto Mode Overview - AWS Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/eks/latest/best-practices/automode.html" rel="noopener noreferrer"&gt;EKS Auto Mode Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/containers/under-the-hood-amazon-eks-auto-mode/" rel="noopener noreferrer"&gt;Under the Hood: Amazon EKS Auto Mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/containers/getting-started-with-amazon-eks-auto-mode/" rel="noopener noreferrer"&gt;Getting Started with EKS Auto Mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/auto-change.html" rel="noopener noreferrer"&gt;EKS Auto Mode Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest/examples/eks-auto-mode" rel="noopener noreferrer"&gt;terraform-aws-modules/eks Auto Mode Example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/auto-migrate-karpenter.html" rel="noopener noreferrer"&gt;Migrate from Karpenter to Auto Mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/whitepapers/latest/security-overview-amazon-eks-auto-mode/benefits.html" rel="noopener noreferrer"&gt;EKS Auto Mode Security Whitepaper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/playtomic/migrating-your-cluster-to-eks-auto-mode-what-you-need-to-know-before-taking-the-leap-253m"&gt;Playtomic Migration Experience&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/eks/pricing/" rel="noopener noreferrer"&gt;Amazon EKS Pricing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/RDarrylR/eks-auto-mode-terraform" rel="noopener noreferrer"&gt;Full Terraform Code - GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.meetup.com/aws-london-on-user-group/" rel="noopener noreferrer"&gt;AWS London Ontario User Group&lt;/a&gt; - Meetups, talks, and community for AWS builders in the London, Ontario area&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/@AWSLondonONMeetup" rel="noopener noreferrer"&gt;AWS London Ontario User Group YouTube&lt;/a&gt; - Recorded talks and presentations&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Connect with me on&lt;/em&gt; &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://bsky.app/profile/darrylruggles.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://github.com/RDarrylR" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;&lt;em&gt;, or the&lt;/em&gt; &lt;a href="https://community.aws/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;&lt;em&gt;. Check out more of my projects at&lt;/em&gt; &lt;a href="https://darryl-ruggles.cloud" rel="noopener noreferrer"&gt;darryl-ruggles.cloud&lt;/a&gt; &lt;em&gt;and join the&lt;/em&gt; &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless&lt;/a&gt; &lt;em&gt;community.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>eks</category>
      <category>aws</category>
      <category>kubernetes</category>
      <category>terraform</category>
    </item>
    <item>
      <title>Elastic Container Service (ECS): My default choice for containers on AWS</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Wed, 18 Mar 2026 14:08:11 +0000</pubDate>
      <link>https://forem.com/aws-builders/elastic-container-service-ecs-my-default-choice-for-containers-on-aws-14pp</link>
      <guid>https://forem.com/aws-builders/elastic-container-service-ecs-my-default-choice-for-containers-on-aws-14pp</guid>
      <description>&lt;p&gt;Amazon Elastic Container Service is the default AWS service I reach for whenever I need to run containers. Whether it's a batch processing pipeline that fans out across hundreds of Fargate tasks or a FastAPI backend sitting behind an Application Load Balancer, ECS handles the orchestration without the operational complexity of Kubernetes. The control plane is free, the AWS integration is deep, and as of early 2026, the deployment capabilities rival anything in the container ecosystem.&lt;/p&gt;

&lt;p&gt;I recently presented on ECS and decided to write down the things I have learned from building real projects into one place. This blog post is the companion to that presentation - a deep dive into what ECS offers, how I use it, and how you can start building with it today.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Containers?
&lt;/h2&gt;

&lt;p&gt;Before we talk about ECS specifically, let's talk about why containers matter. Four core principles make containers compelling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Consistency&lt;/strong&gt; - The same image runs identically on your laptop, in CI, and in production. No more "works on my machine."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Isolation&lt;/strong&gt; - Each container gets its own filesystem, networking, and process space. Multiple services on the same host without conflicts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency&lt;/strong&gt; - Containers share the host OS kernel. Startup in seconds, not minutes. Far less overhead than virtual machines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portability&lt;/strong&gt; - A Docker image runs on ECS, EKS, Lambda, or your own servers. Your business logic stays runtime-agnostic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In my &lt;a href="https://darryl-ruggles.cloud/dsql-kabob-store/" rel="noopener noreferrer"&gt;Aurora DSQL Kabob Store&lt;/a&gt; project, I made this a deliberate design decision - keeping business logic runtime-agnostic so the same FastAPI application could deploy on Fargate, EC2, EKS, or Lambda with minimal adapter code.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Amazon ECS?
&lt;/h2&gt;

&lt;p&gt;ECS is a fully managed container orchestration service. You define what to run and how, and ECS handles placement, scaling, availability, and integration with the rest of AWS.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe3fvuvot442hhxuuwkkc.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe3fvuvot442hhxuuwkkc.webp" alt="ECS Overview" width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Four things make ECS stand out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No control plane cost&lt;/strong&gt; - Unlike EKS (~$75/month per cluster), the ECS orchestration layer is completely free. You only pay for the compute your containers use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deep AWS integration&lt;/strong&gt; - IAM roles per task, CloudWatch Container Insights, native ALB target groups, Secrets Manager injection, and tight integration with every major AWS service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexible compute&lt;/strong&gt; - Choose between Fargate (serverless), EC2 (self-managed), or the new Managed Instances (AWS-managed EC2).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment sophistication&lt;/strong&gt; - Rolling updates, native blue/green, canary, and linear deployments all built in.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Core Concepts
&lt;/h2&gt;

&lt;p&gt;Five building blocks make up ECS:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cluster&lt;/strong&gt; - A logical grouping of tasks and services. Think of it as your namespace. A cluster can span Fargate, EC2, and Managed Instances simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task Definition&lt;/strong&gt; - The blueprint. A JSON document that specifies container images, CPU, memory, networking mode, volumes, IAM roles, and logging configuration. Versioned - each registration creates a new revision (e.g., &lt;code&gt;my-app:3&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task&lt;/strong&gt; - A running instance of a task definition. One or more containers working together. On Fargate, each task gets its own elastic network interface and private IP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Service&lt;/strong&gt; - Maintains a desired count of tasks. Handles replacement of failed tasks, load balancer registration, auto scaling, and deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Container Instance&lt;/strong&gt; - An EC2 instance running the ECS agent, registered to a cluster. Only relevant if you're using the EC2 launch type.&lt;/p&gt;




&lt;h2&gt;
  
  
  Compute Options
&lt;/h2&gt;

&lt;p&gt;ECS gives you four ways to provide compute for your containers. Choosing the right one depends on your workload characteristics, cost sensitivity, and operational preferences.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zlt237ferzf5bw54tir.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zlt237ferzf5bw54tir.webp" alt="ECS Compute Options" width="800" height="570"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Fargate (Serverless) - The Default Choice
&lt;/h3&gt;

&lt;p&gt;Fargate is what I recommend for most workloads. With it, there are no EC2 instances to manage. You specify CPU and memory at the task level, and AWS handles everything underneath.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CPU (vCPU)&lt;/th&gt;
&lt;th&gt;Memory Options&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0.25&lt;/td&gt;
&lt;td&gt;512 MiB, 1-2 GB&lt;/td&gt;
&lt;td&gt;Linux only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;1-4 GB&lt;/td&gt;
&lt;td&gt;Linux only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2-8 GB&lt;/td&gt;
&lt;td&gt;Linux and Windows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4-16 GB&lt;/td&gt;
&lt;td&gt;Linux and Windows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;8-30 GB&lt;/td&gt;
&lt;td&gt;Linux and Windows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;16-60 GB&lt;/td&gt;
&lt;td&gt;Linux only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;32-120 GB&lt;/td&gt;
&lt;td&gt;Linux only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Fargate supports both x86_64 and ARM64 (Graviton) architectures. Graviton gives you roughly 20% better price-performance for most workloads. Pricing is per-second based on vCPU and memory consumed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fargate Spot&lt;/strong&gt; offers up to 70% savings for fault-tolerant workloads. When AWS reclaims capacity, tasks receive a SIGTERM with a 2-minute warning. I use this for batch processing jobs where interruption just means retrying one file.&lt;/p&gt;

&lt;h3&gt;
  
  
  EC2 Launch Type
&lt;/h3&gt;

&lt;p&gt;This offers full control over the underlying instances. You choose the AMI, instance type, and manage patching and scaling yourself. Choose EC2 when you need GPUs, custom AMIs, specific instance families, or when sustained high utilization makes reserved instances cheaper than Fargate.&lt;/p&gt;

&lt;p&gt;The trade-off is clear: more control, more operational burden.&lt;/p&gt;

&lt;h3&gt;
  
  
  ECS Managed Instances 
&lt;/h3&gt;

&lt;p&gt;Launched in September 2025, Managed Instances bridge the gap between Fargate simplicity and EC2 flexibility. AWS handles provisioning, auto-scaling, Bottlerocket OS patching (14-day cycles), and host replacement. You control instance type selection via attribute-based selection - say "I need 4 GPUs" and ECS picks the right instance.&lt;/p&gt;

&lt;p&gt;The "start before stop" principle for host replacement is particularly nice - new capacity comes up before old goes down, maintaining availability throughout.&lt;/p&gt;

&lt;p&gt;This is the answer for GPU workloads and ML inference where Fargate isn't an option but you don't want to manage EC2 fleets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capacity Providers
&lt;/h3&gt;

&lt;p&gt;Capacity providers are the recommended way to configure compute. The strategy uses two parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Base&lt;/strong&gt; - Minimum tasks guaranteed on a specific provider (only one provider can have a base)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weight&lt;/strong&gt; - Relative proportion of tasks after the base is filled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example: base 2 on FARGATE, weight 4 on FARGATE_SPOT, weight 1 on FARGATE. Your first 2 tasks are guaranteed to use on-demand Fargate. After that, 4 out of every 5 new tasks go to Spot. Cost optimization with a reliability floor.&lt;/p&gt;




&lt;h2&gt;
  
  
  Task Definitions - The Blueprint
&lt;/h2&gt;

&lt;p&gt;The task definition is where you define everything about your containers. Here are the critical parameters:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfqk08kim3rl1q1rhjsa.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfqk08kim3rl1q1rhjsa.webp" alt="ECS Task Definition Anatomy" width="800" height="570"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Container Definitions
&lt;/h3&gt;

&lt;p&gt;Each task definition contains one or more container definitions. Key parameters include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;image&lt;/strong&gt; - Docker image from ECR (Elastic Container Registry), Docker Hub, or any private registry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;essential&lt;/strong&gt; - If an essential container stops, the entire task stops. Your main app is essential; your log router sidecar might not be&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;portMappings&lt;/strong&gt; - Container ports, with named ports for Service Connect&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;healthCheck&lt;/strong&gt; - CMD-SHELL command with configurable interval, timeout, retries, and start period&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;dependsOn&lt;/strong&gt; - Container startup ordering with conditions: START, COMPLETE, SUCCESS, HEALTHY&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;restartPolicy&lt;/strong&gt; - Container-level restarts without killing the entire task. Configurable attempt period (60-1800 seconds) and ignored exit codes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Task Role vs Execution Role
&lt;/h3&gt;

&lt;p&gt;This distinction trips people up:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Task Role&lt;/th&gt;
&lt;th&gt;Execution Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Permissions for your application code&lt;/td&gt;
&lt;td&gt;Permissions for the ECS agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Used by&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Your containers calling AWS APIs&lt;/td&gt;
&lt;td&gt;ECS pulling images, pushing logs, fetching secrets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;S3 read/write, DynamoDB access&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ecr:GetAuthorizationToken&lt;/code&gt;, &lt;code&gt;logs:CreateLogStream&lt;/code&gt;, &lt;code&gt;secretsmanager:GetSecretValue&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two different roles, two different purposes. The task role follows least privilege for your application. The execution role is about infrastructure plumbing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Secrets Injection
&lt;/h3&gt;

&lt;p&gt;ECS natively injects secrets as environment variables from Secrets Manager or SSM Parameter Store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"secrets"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DB_PASSWORD"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"valueFrom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:secretsmanager:us-east-1:123456789:secret:my-secret"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"API_KEY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"valueFrom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:ssm:us-east-1:123456789:parameter/my-param"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Secrets Manager supports specific JSON keys (&lt;code&gt;arn:...secret:my-secret:username::&lt;/code&gt;) and version staging. Never bake secrets into container images or task definitions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Networking
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1im6n8d8iwjtlkisohtw.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1im6n8d8iwjtlkisohtw.webp" alt="ECS Networking with awsvpc Mode" width="800" height="1058"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Networking Modes
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Use With&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;awsvpc&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Each task gets its own ENI and private IP. Per-task security groups. Required for Fargate.&lt;/td&gt;
&lt;td&gt;Fargate, EC2, Managed Instances&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;bridge&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker's virtual network. Dynamic port mapping with ALB.&lt;/td&gt;
&lt;td&gt;EC2 only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;host&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Containers use host's network directly. No port isolation.&lt;/td&gt;
&lt;td&gt;EC2 only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;none&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No external networking.&lt;/td&gt;
&lt;td&gt;EC2 only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Recommendation:&lt;/strong&gt; Use awsvpc unless you have a specific reason not to. It's the only mode that works everywhere and gives you per-task security groups.&lt;/p&gt;

&lt;h3&gt;
  
  
  VPC Architecture
&lt;/h3&gt;

&lt;p&gt;For production workloads, run ECS tasks in private subnets. Use VPC endpoints for ECR, S3, and CloudWatch to avoid NAT gateway data transfer costs. This is the biggest hidden cost in ECS architectures - NAT gateways charging for every image pull and log push.&lt;/p&gt;

&lt;h3&gt;
  
  
  Service Connect
&lt;/h3&gt;

&lt;p&gt;Service Connect is the recommended way to handle service-to-service communication. It automatically injects an Envoy proxy as a sidecar, providing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Service discovery via Cloud Map namespaces&lt;/li&gt;
&lt;li&gt;Client-side load balancing with retries and outlier detection&lt;/li&gt;
&lt;li&gt;Free application-level traffic metrics in CloudWatch (request count, latency, error rates)&lt;/li&gt;
&lt;li&gt;Support for HTTP, HTTP2, gRPC, and TCP&lt;/li&gt;
&lt;li&gt;Per-request Envoy access logs (October 2025)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4r3soqdg6jg7n8lcxon.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4r3soqdg6jg7n8lcxon.webp" alt="ECS Service Connect" width="800" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Service Connect replaces AWS App Mesh, which will be discontinued in September 2026.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment Strategies
&lt;/h2&gt;

&lt;p&gt;ECS has the most sophisticated deployment options of any container orchestrator on AWS. As of March 2026, four strategies are available natively.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2odccun4pce5kz8d1w29.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2odccun4pce5kz8d1w29.webp" alt="ECS Deployment Strategies" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Rolling Update (Default)
&lt;/h3&gt;

&lt;p&gt;Gradually replaces old tasks with new ones. Controlled by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;minimumHealthyPercent&lt;/strong&gt; (default 100%) - Minimum tasks that must remain running&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;maximumPercent&lt;/strong&gt; (default 200%) - Maximum tasks allowed during deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For zero-downtime with a desired count of 4: min 100%, max 200% means ECS starts 4 new tasks, waits for them to be healthy, then stops the 4 old tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Blue/Green
&lt;/h3&gt;

&lt;p&gt;Built into ECS without CodeDeploy dependency. Provisions 100% new capacity ("green"), validates, then shifts all production traffic at once. Key features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Six Lambda lifecycle hooks: pre-scale-up, post-scale-up, test traffic shift, production traffic shift, post-test, post-production&lt;/li&gt;
&lt;li&gt;Configurable bake time for instant rollback window&lt;/li&gt;
&lt;li&gt;Works with ALB, NLB, and Service Connect&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Canary (October 2025)
&lt;/h3&gt;

&lt;p&gt;Two-stage deployment: shift a small percentage of traffic first (configurable from 0.1% to 99.9%), validate with real production traffic, then shift the rest. Ideal for critical user-facing services where you want to limit blast radius.&lt;/p&gt;

&lt;h3&gt;
  
  
  Linear (October 2025)
&lt;/h3&gt;

&lt;p&gt;Gradual traffic shift in equal increments as small as 3%, with configurable bake time between each step. The most conservative approach - allows monitoring at each increment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment Circuit Breaker
&lt;/h3&gt;

&lt;p&gt;The safety net across all strategies. If tasks keep failing to start or pass health checks, ECS automatically stops the deployment and optionally rolls back to the last successful version. You can wire CloudWatch Alarms into the circuit breaker to detect application-level failures, not just infrastructure failures.&lt;/p&gt;




&lt;h2&gt;
  
  
  Service Auto Scaling
&lt;/h2&gt;

&lt;p&gt;ECS uses Application Auto Scaling with four policy types:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Target Tracking&lt;/strong&gt; - Set a target metric value (e.g., CPU at 50%). Simplest to configure - works like a thermostat.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step Scaling&lt;/strong&gt; - Define explicit threshold/action pairs. React differently at different severity levels.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduled Scaling&lt;/strong&gt; - Time-based. Scale up for business hours, down at night. Supports scaling to zero (set minimum capacity to 0).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Predictive Scaling&lt;/strong&gt; - ML-based. Analyzes historical patterns and proactively scales before demand hits. Doesn't trigger scale-ins on its own - pair with target tracking.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Important behavior: scale-in is automatically paused during deployments to protect availability.&lt;/p&gt;




&lt;h2&gt;
  
  
  Storage Options
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Storage&lt;/th&gt;
&lt;th&gt;Persistence&lt;/th&gt;
&lt;th&gt;Shared&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ephemeral&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Task lifetime&lt;/td&gt;
&lt;td&gt;Within task&lt;/td&gt;
&lt;td&gt;Temp files, caches. Default 20 GiB, up to 200 GiB on Fargate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EFS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Persistent&lt;/td&gt;
&lt;td&gt;Across tasks&lt;/td&gt;
&lt;td&gt;Shared config, models, content. Multi-AZ, IAM auth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EBS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;td&gt;Single task&lt;/td&gt;
&lt;td&gt;High-IOPS data processing. One volume per task&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bind Mounts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Task lifetime&lt;/td&gt;
&lt;td&gt;Within task&lt;/td&gt;
&lt;td&gt;Container-to-container data sharing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;EFS is the most versatile - persistent, shared across tasks, supports IAM authorization and transit encryption. EBS is for high-performance block storage when EFS throughput is insufficient. One gotcha with EFS though is that it can take minutes to create a new EFS but this is typically a one time thing for a given application.&lt;/p&gt;




&lt;h2&gt;
  
  
  Security
&lt;/h2&gt;

&lt;p&gt;Security in ECS follows the principle of least privilege at the task level:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Task roles&lt;/strong&gt; - Each task definition gets its own IAM role. Your batch processor gets S3 and SQS access. Your API gets DynamoDB access. Not a shared instance profile.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secrets injection&lt;/strong&gt; - Secrets Manager and SSM Parameter Store values injected as environment variables at startup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network isolation&lt;/strong&gt; - awsvpc mode gives each task its own security group. Run tasks in private subnets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read-only root filesystem&lt;/strong&gt; - Run containers with &lt;code&gt;readonlyRootFilesystem: true&lt;/code&gt; for hardening.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image scanning&lt;/strong&gt; - ECR enhanced scanning with Amazon Inspector continuously scans for OS and language package vulnerabilities. As of 2026, it supports minimal base images like scratch and distroless, and shows which images are running in your clusters.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Observability
&lt;/h2&gt;

&lt;h3&gt;
  
  
  CloudWatch Container Insights (Enhanced)
&lt;/h3&gt;

&lt;p&gt;Container Insights provides granular metrics at the cluster, service, task, and container level. The honeycomb visualization gives you cluster health at a glance - alarm state and utilization side by side. Deployment tracking alongside infrastructure anomalies. Cross-account monitoring for unified views. Can be enabled per-cluster or account-wide.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logging
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;awslogs&lt;/strong&gt; driver sends container logs directly to CloudWatch Logs. As of June 2025, the default mode switched from blocking to non-blocking - if the log buffer fills up, excess logs are dropped rather than blocking your application. This prioritizes availability over logging completeness.&lt;/p&gt;

&lt;p&gt;For advanced log routing, &lt;strong&gt;FireLens&lt;/strong&gt; with Fluent Bit as a sidecar routes logs to any destination - CloudWatch, S3, Elasticsearch, Datadog, Splunk. Different containers can route to different destinations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tracing
&lt;/h3&gt;

&lt;p&gt;Deploy the &lt;strong&gt;AWS Distro for OpenTelemetry (ADOT)&lt;/strong&gt; collector as a sidecar. It receives OTLP traces on port 4317 (gRPC) or 4318 (HTTP) and exports to X-Ray automatically. This replaces the legacy X-Ray daemon approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  ECS Exec
&lt;/h3&gt;

&lt;p&gt;ECS Exec lets you shell into a running container directly - the equivalent of &lt;code&gt;docker exec&lt;/code&gt; but for tasks running on Fargate or EC2. It uses AWS Systems Manager (SSM) under the hood, so there's no need to open inbound ports or SSH. I use this all the time - it's one of the most useful ECS features IMO.&lt;/p&gt;

&lt;p&gt;To enable it, set &lt;code&gt;enableExecuteCommand: true&lt;/code&gt; on your service or run task call. Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ecs execute-command &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cluster&lt;/span&gt; my-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--task&lt;/span&gt; abc123 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--container&lt;/span&gt; my-app &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--interactive&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--command&lt;/span&gt; &lt;span class="s2"&gt;"/bin/sh"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is invaluable for debugging - inspecting environment variables, checking network connectivity, verifying mounted volumes, or tailing logs inside the container. A few things to keep in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It must be enabled before the task launches - you can't retroactively enable it on already-running tasks. For services, enabling it requires a new deployment&lt;/li&gt;
&lt;li&gt;The task role needs SSM permissions (&lt;code&gt;ssmmessages:CreateControlChannel&lt;/code&gt;, &lt;code&gt;ssmmessages:CreateDataChannel&lt;/code&gt;, &lt;code&gt;ssmmessages:OpenControlChannel&lt;/code&gt;, &lt;code&gt;ssmmessages:OpenDataChannel&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;The container image needs a shell (&lt;code&gt;/bin/sh&lt;/code&gt; or &lt;code&gt;/bin/bash&lt;/code&gt;) - scratch and distroless images won't work&lt;/li&gt;
&lt;li&gt;All sessions are logged to CloudWatch or S3 for audit&lt;/li&gt;
&lt;li&gt;Works with both Fargate and EC2 launch types&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For quick diagnostics, the &lt;a href="https://github.com/aws-containers/amazon-ecs-exec-checker" rel="noopener noreferrer"&gt;amazon-ecs-exec-checker&lt;/a&gt; script validates that your task, role, and agent are configured correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Architecture: Batch Processing with Fargate
&lt;/h2&gt;

&lt;p&gt;My &lt;a href="https://darryl-ruggles.cloud/serverless-data-processor-using-aws-lambda-step-functions-and-fargate-on-ecs-with-rust/" rel="noopener noreferrer"&gt;Serverless Data Processor&lt;/a&gt; project demonstrates the batch processing pattern.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rffh8aypismlyxvzc2c.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rffh8aypismlyxvzc2c.webp" alt="Batch Processing with Step Functions and Fargate" width="800" height="612"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The architecture: S3 upload triggers a Lambda that extracts files. Step Functions distributed map fans out processing across Fargate tasks - each file gets its own container. The containers use the &lt;code&gt;waitForTaskToken&lt;/code&gt; pattern - Step Functions passes a callback token as an environment variable, the Rust container processes the data, then calls &lt;code&gt;send_task_success&lt;/code&gt; to signal completion.&lt;/p&gt;

&lt;p&gt;Key details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fargate at minimum specs: 0.25 vCPU, 512 MB RAM&lt;/li&gt;
&lt;li&gt;OpenTelemetry sidecar for CloudWatch metrics&lt;/li&gt;
&lt;li&gt;Container images in ECR with multi-stage Docker builds&lt;/li&gt;
&lt;li&gt;Written in Rust for the worker containers&lt;/li&gt;
&lt;li&gt;Infrastructure managed with Terraform&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is the Step Functions integration that launches Fargate tasks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Task"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:states:::ecs:runTask.waitForTaskToken"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"LaunchType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"FARGATE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Cluster"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${ecs_cluster}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"TaskDefinition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${task_def_name}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"NetworkConfiguration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"AwsvpcConfiguration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Subnets"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"${fargate_subnet}"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"SecurityGroups"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"${vpc_default_sg}"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Overrides"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"ContainerOverrides"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"store_data_processor_daily"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Environment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"TASK_TOKEN"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Value.$"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$$.Task.Token"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"S3_BUCKET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Value.$"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$.BatchInput.source_bucket_name"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"S3_KEY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Value.$"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"$.Items[0].Key"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern works for any fan-out workload - ETL, media processing, report generation, ML batch inference. Each task is independent, starts in seconds, processes its data, and terminates. You pay only for the compute time used.&lt;/p&gt;

&lt;p&gt;I also used this same Fargate + callback token pattern in my &lt;a href="https://darryl-ruggles.cloud/serverless-pizza-ordering/" rel="noopener noreferrer"&gt;Serverless Pizza Ordering&lt;/a&gt; project, where the Fargate container simulated pizza preparation and delivery - chosen over Lambda because the "AI" insisted some pizzas could take more than 15 minutes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Architecture: API Backend on Fargate
&lt;/h2&gt;

&lt;p&gt;My &lt;a href="https://darryl-ruggles.cloud/dsql-kabob-store/" rel="noopener noreferrer"&gt;Aurora DSQL Kabob Store&lt;/a&gt; project uses ECS Fargate as an always-on API backend.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2pa6u05wvacow93titdy.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2pa6u05wvacow93titdy.webp" alt="FastAPI Backend on ECS Fargate" width="800" height="630"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;React frontend behind CloudFront, Application Load Balancer routing to FastAPI containers on Fargate, connecting to Aurora DSQL for multi-region active-active writes.&lt;/p&gt;

&lt;p&gt;The key design decision: keep business logic runtime-agnostic. The same FastAPI application uses direct psycopg2 queries (not ORM) so it can deploy across Fargate, ECS on EC2, Lambda, or EKS with minimal adapter code.&lt;/p&gt;

&lt;p&gt;Fargate costs about 20-30% more than equivalent EC2 on-demand for sustained workloads, but the operational simplicity during development is worth it. In practice, real migrations from Fargate to EC2 often yield smaller savings than expected - &lt;a href="https://www.tines.com/blog/migrating-ecs-fargate-ec2/" rel="noopener noreferrer"&gt;Tines reported only ~5% compute cost savings&lt;/a&gt; after migrating, though they saw 30% faster job processing and 10% lower P95 latency from having dedicated hardware. The biggest cost was actually the VPC infrastructure - NAT gateways at ~$2-3/day - not ECS itself. I &lt;code&gt;terraform destroy&lt;/code&gt; when not actively developing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Terraform Examples
&lt;/h2&gt;

&lt;p&gt;Here is the core Terraform for an ECS Fargate setup, taken from my &lt;a href="https://github.com/RDarrylR/step-functions-fargate-processor" rel="noopener noreferrer"&gt;Serverless Data Processor&lt;/a&gt; project:&lt;/p&gt;

&lt;h3&gt;
  
  
  Cluster
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_cluster"&lt;/span&gt; &lt;span class="s2"&gt;"ecs_cluster"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-cluster"&lt;/span&gt;

  &lt;span class="nx"&gt;setting&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"containerInsights"&lt;/span&gt;
    &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"enabled"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Task Definition
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_task_definition"&lt;/span&gt; &lt;span class="s2"&gt;"fargate_processor_task"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;family&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;task_definition_name&lt;/span&gt;
  &lt;span class="nx"&gt;execution_role_arn&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ecs_task_execution_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
  &lt;span class="nx"&gt;task_role_arn&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ecs_task_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
  &lt;span class="nx"&gt;network_mode&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"awsvpc"&lt;/span&gt;
  &lt;span class="nx"&gt;requires_compatibilities&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"FARGATE"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;cpu&lt;/span&gt;                      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fargate_cpu&lt;/span&gt;
  &lt;span class="nx"&gt;memory&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fargate_memory&lt;/span&gt;

  &lt;span class="nx"&gt;container_definitions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;templatefile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"${path.module}/container-definitions.json.tpl"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;app_image&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;app_image&lt;/span&gt;
      &lt;span class="nx"&gt;fargate_cpu&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fargate_cpu&lt;/span&gt;
      &lt;span class="nx"&gt;fargate_memory&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fargate_memory&lt;/span&gt;
      &lt;span class="nx"&gt;aws_region&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_region&lt;/span&gt;
      &lt;span class="nx"&gt;project_name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_name&lt;/span&gt;
      &lt;span class="nx"&gt;task_container_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;task_container_name&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Container Definition Template
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cpu"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;fargate_cpu&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"essential"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${app_image}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"memory"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;fargate_memory&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${task_container_name}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"networkMode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"awsvpc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"environment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"S3_BUCKET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"my-bucket"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"S3_KEY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"data/input.json"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"logConfiguration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"logDriver"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"awslogs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"awslogs-group"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/ecs/${project_name}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"awslogs-region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${aws_region}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"awslogs-stream-prefix"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${project_name}-log-stream"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"public.ecr.aws/aws-observability/aws-otel-collector:latest"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aws-otel-collector"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"essential"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"--config=/etc/ecs/ecs-cloudwatch.yaml"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"logConfiguration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"logDriver"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"awslogs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"awslogs-group"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/ecs/${project_name}-otel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"awslogs-region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${aws_region}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"awslogs-stream-prefix"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"otel"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ECS Service with ALB (for always-on workloads)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_service"&lt;/span&gt; &lt;span class="s2"&gt;"api_service"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-service"&lt;/span&gt;
  &lt;span class="nx"&gt;cluster&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_ecs_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ecs_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;task_definition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_ecs_task_definition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;api_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
  &lt;span class="nx"&gt;desired_count&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;desired_count&lt;/span&gt;
  &lt;span class="nx"&gt;launch_type&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"FARGATE"&lt;/span&gt;

  &lt;span class="nx"&gt;network_configuration&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;subnets&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnet_ids&lt;/span&gt;
    &lt;span class="nx"&gt;security_groups&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ecs_tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;assign_public_ip&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;load_balancer&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;target_group_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_lb_target_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
    &lt;span class="nx"&gt;container_name&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;container_name&lt;/span&gt;
    &lt;span class="nx"&gt;container_port&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;container_port&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;deployment_circuit_breaker&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;enable&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="nx"&gt;rollback&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;depends_on&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_lb_listener&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three resources for the core ECS setup: cluster, task definition, and service. The container definition template handles the application specifics. The full Terraform for both projects is in the GitHub repos linked at the end.&lt;/p&gt;




&lt;h2&gt;
  
  
  Recent Features (2025-2026)
&lt;/h2&gt;

&lt;p&gt;ECS has had a remarkable year of feature launches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI Developer Tools&lt;/strong&gt; (December 2025) - ECS MCP Server for AI-assisted development and operations. Natural language commands for cluster management.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ECS Express Mode&lt;/strong&gt; (November 2025) - Deploy a production-ready containerized web app with just three inputs: a container image, a task execution role, and an infrastructure role. Provisions Fargate, ALB with SSL, auto scaling, monitoring, and a unique URL. Up to 25 services can share one ALB. No additional charge beyond the underlying resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Canary and Linear Deployments&lt;/strong&gt; (October 2025) - Fine-grained traffic shifting. Canary from 0.1% to 99.9%, linear in increments as small as 3%.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service Connect Envoy Access Logs&lt;/strong&gt; (October 2025) - Per-request telemetry for HTTP, HTTP2, gRPC, and TCP.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ECS Managed Instances&lt;/strong&gt; (September 2025) - AWS-managed EC2 with Bottlerocket OS. Attribute-based instance selection for GPUs and specialized hardware.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native Blue/Green Deployments&lt;/strong&gt; (July 2025) - Built into ECS without CodeDeploy. Six Lambda lifecycle hooks for testing and approval at each phase. Configurable bake time for instant rollback.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-Blocking Log Driver Default&lt;/strong&gt; (June 2025) - Prioritizes task availability over logging completeness.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ECS vs EKS - When to Use What
&lt;/h2&gt;

&lt;p&gt;This is the most common question I get. Both solve the same fundamental problem - running containers reliably at scale.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criteria&lt;/th&gt;
&lt;th&gt;ECS&lt;/th&gt;
&lt;th&gt;EKS&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Control plane cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;~$75/month per cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Learning curve&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS-native concepts&lt;/td&gt;
&lt;td&gt;Kubernetes concepts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deep, native&lt;/td&gt;
&lt;td&gt;Good, via add-ons&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-cloud&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS only&lt;/td&gt;
&lt;td&gt;Portable K8s manifests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ecosystem&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS tooling&lt;/td&gt;
&lt;td&gt;Helm, ArgoCD, Istio, operators&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Managed compute&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fargate, Managed Instances&lt;/td&gt;
&lt;td&gt;Fargate, managed node groups&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Choose ECS&lt;/strong&gt; when your team is AWS-focused, you want operational simplicity, you value the free control plane, and your workloads are straightforward services and batch jobs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose EKS&lt;/strong&gt; when your team knows Kubernetes, you need multi-cloud portability, you want the Kubernetes ecosystem (Helm, ArgoCD, custom operators), or you're running complex stateful workloads.&lt;/p&gt;

&lt;p&gt;Most organizations pick based on team expertise and existing tooling, not technical limitations.&lt;/p&gt;




&lt;h2&gt;
  
  
  ECS vs Lambda - Containers vs Functions
&lt;/h2&gt;

&lt;p&gt;Another comparison that comes up frequently:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criteria&lt;/th&gt;
&lt;th&gt;ECS/Fargate&lt;/th&gt;
&lt;th&gt;Lambda&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max duration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;15 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;120 GB&lt;/td&gt;
&lt;td&gt;10 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Startup&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Seconds (image pull)&lt;/td&gt;
&lt;td&gt;Milliseconds (warm) to seconds (cold)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pricing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-second (vCPU + memory)&lt;/td&gt;
&lt;td&gt;Per-invocation + duration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Service auto scaling&lt;/td&gt;
&lt;td&gt;Automatic per-request&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Long-running, resource-heavy, always-on&lt;/td&gt;
&lt;td&gt;Event-driven, short-lived, bursty&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In my projects I use both. Lambda for event handling - S3 triggers, API endpoints, file extraction. Fargate for heavy processing - data transformation, ML inference, container workloads that need full runtime control.&lt;/p&gt;




&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;After building several production systems with ECS, here are the practices I've found most valuable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use Fargate unless you need GPUs or specific instance types.&lt;/strong&gt; The operational simplicity is worth the cost premium for most workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use awsvpc networking mode everywhere.&lt;/strong&gt; It's the only mode that works on all compute types and gives you per-task security groups.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable the deployment circuit breaker with rollback.&lt;/strong&gt; This catches failed deployments before they impact all traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use capacity provider strategies to mix Spot and on-demand.&lt;/strong&gt; A base of on-demand with weighted Spot gives you cost savings with a reliability floor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inject secrets via Secrets Manager.&lt;/strong&gt; Never bake them into images or pass them as plain environment variables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable Container Insights.&lt;/strong&gt; The per-task metrics and honeycomb visualization are invaluable for debugging.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Service Connect for service-to-service communication.&lt;/strong&gt; Free traffic metrics and managed Envoy proxies with no code changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use VPC endpoints for ECR, S3, and CloudWatch.&lt;/strong&gt; NAT gateway data transfer costs are the biggest hidden expense in ECS architectures.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use multi-stage Docker builds.&lt;/strong&gt; Keep images small. A Rust binary in a scratch image is a few megabytes. A Python app in a slim image with only production dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Define health checks in the task definition.&lt;/strong&gt; Don't rely solely on ALB health checks - container-level health checks catch issues faster.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Pricing - You Pay for Compute, Not Orchestration
&lt;/h2&gt;

&lt;p&gt;The most important thing to know: &lt;strong&gt;ECS orchestration is free.&lt;/strong&gt; You only pay for the compute resources your containers consume.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fargate&lt;/strong&gt; - Per-second billing for vCPU ($0.04048/hour) and memory ($0.004445/GB/hour). Spot is up to 70% less. Compute Savings Plans can reduce costs by up to 49% (3-year all-upfront) or ~20% (1-year no-upfront).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EC2&lt;/strong&gt; - Standard instance pricing. Use Savings Plans or Reserved Instances for sustained workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed Instances&lt;/strong&gt; - EC2 instance pricing plus a management fee for automated provisioning, patching, and host replacement.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hidden costs to watch:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;NAT gateways - $0.045/GB for data processed. Use VPC endpoints.&lt;/li&gt;
&lt;li&gt;ALB - Fixed hourly cost plus per-LCU. Up to 25 ECS Express Mode services can share one ALB.&lt;/li&gt;
&lt;li&gt;ECR storage - $0.10/GB/month. Use lifecycle policies to clean up old images.&lt;/li&gt;
&lt;li&gt;Ephemeral storage - Fargate charges $0.000111/GB/hour above the default 20 GiB.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cost optimization strategies:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Right-sizing&lt;/strong&gt; - The single biggest lever. Reducing from 1 vCPU/2GB to 0.5 vCPU/1GB can yield ~45-50% lower Fargate task cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduled shutdowns&lt;/strong&gt; - Running dev/staging environments only during business hours (8 hours/day, 5 days/week) can reduce costs by over 75%.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Savings Plans&lt;/strong&gt; - Commit to consistent usage for 1-3 years. Even no-upfront 1-year plans save ~20% on Fargate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For my projects, the ECS cost has been minimal. The batch processor runs tasks for seconds at minimum specs. The Kabob Store's main cost was VPC infrastructure ($2-3/day), not ECS.&lt;/p&gt;




&lt;h2&gt;
  
  
  Things to Know
&lt;/h2&gt;

&lt;p&gt;A few operational details worth keeping in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Task placement&lt;/strong&gt; - Fargate handles placement automatically. For EC2, use the &lt;code&gt;binpack&lt;/code&gt; placement strategy to consolidate workloads on fewer instances and reduce waste.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task recycling&lt;/strong&gt; - Fargate tasks on platform version 1.4.0+ are recycled after 14 days of continuous running. Your service will gradually replace old tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ENI limits&lt;/strong&gt; - In awsvpc mode on EC2, each task needs an ENI. Enable ENI trunking to increase density (requires CloudFormation custom resources).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image pull time&lt;/strong&gt; - Large images slow task startup. Keep images lean. Set &lt;code&gt;ECS_IMAGE_PULL_BEHAVIOR=prefer-cached&lt;/code&gt; on EC2 instances to use cached images when available.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spot instance draining&lt;/strong&gt; - For EC2 Spot instances, set &lt;code&gt;ECS_ENABLE_SPOT_INSTANCE_DRAINING=true&lt;/code&gt; on the ECS agent for graceful task termination.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service quotas&lt;/strong&gt; - Default Fargate vCPU quota is 6 on new accounts (up to 4,000 in production). Request increases proactively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Force new deployment&lt;/strong&gt; - If you update a secret or parameter store value, the running tasks won't pick it up automatically. Force a new deployment to refresh.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;ECS is the container orchestration service I use most on AWS. The free control plane, deep AWS integration, and flexible compute options make it the right choice for most container workloads that don't require Kubernetes-specific tooling.&lt;/p&gt;

&lt;p&gt;The recent feature launches have been particularly impressive - native blue/green without CodeDeploy, canary and linear deployments, Managed Instances for GPU workloads, and Express Mode for rapid prototyping. Combined with Fargate's serverless simplicity and Service Connect's built-in service mesh, ECS has matured into a comprehensive platform for running containers at any scale.&lt;/p&gt;

&lt;p&gt;I've used it for &lt;a href="https://darryl-ruggles.cloud/serverless-data-processor-using-aws-lambda-step-functions-and-fargate-on-ecs-with-rust/" rel="noopener noreferrer"&gt;batch data processing&lt;/a&gt; with Step Functions fan-out, &lt;a href="https://darryl-ruggles.cloud/serverless-pizza-ordering/" rel="noopener noreferrer"&gt;pizza ordering&lt;/a&gt; with long-running container workflows, and &lt;a href="https://darryl-ruggles.cloud/dsql-kabob-store/" rel="noopener noreferrer"&gt;multi-region API backends&lt;/a&gt; with Aurora DSQL. In every case, ECS handled the orchestration cleanly while I focused on the application logic.&lt;/p&gt;

&lt;p&gt;If you're running containers on AWS and haven't looked at ECS recently, the current feature set is worth a fresh evaluation. Start with a Fargate service behind an ALB, enable Container Insights, and go from there.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html" rel="noopener noreferrer"&gt;Amazon ECS Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/intro.html" rel="noopener noreferrer"&gt;Amazon ECS Best Practices Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/serverless-data-processor-using-aws-lambda-step-functions-and-fargate-on-ecs-with-rust/" rel="noopener noreferrer"&gt;Serverless Data Processor - Step Functions + Fargate&lt;/a&gt; - Batch processing with ECS&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/serverless-pizza-ordering/" rel="noopener noreferrer"&gt;Serverless Pizza Ordering&lt;/a&gt; - Long-running Fargate container workflows&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/dsql-kabob-store/" rel="noopener noreferrer"&gt;Aurora DSQL Kabob Store&lt;/a&gt; - FastAPI on Fargate with multi-region database&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/RDarrylR/step-functions-fargate-processor" rel="noopener noreferrer"&gt;Step Functions + Fargate GitHub Repo&lt;/a&gt; - Full Terraform and Rust container code&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/RDarrylR/serverless-pizza-ordering" rel="noopener noreferrer"&gt;Serverless Pizza GitHub Repo&lt;/a&gt; - Full Terraform and container code&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.meetup.com/aws-london-on-user-group/" rel="noopener noreferrer"&gt;AWS London Ontario User Group&lt;/a&gt; - Meetups, talks, and community for AWS builders in the London, Ontario area&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/@AWSLondonONMeetup" rel="noopener noreferrer"&gt;AWS London Ontario User Group YouTube&lt;/a&gt; - Recorded talks and presentations&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Connect with me on&lt;/em&gt; &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://bsky.app/profile/darrylruggles.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://github.com/RDarrylR" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;&lt;em&gt;, or the&lt;/em&gt; &lt;a href="https://community.aws/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;&lt;em&gt;. Check out more of my projects at&lt;/em&gt; &lt;a href="https://darryl-ruggles.cloud" rel="noopener noreferrer"&gt;darryl-ruggles.cloud&lt;/a&gt; &lt;em&gt;and join the&lt;/em&gt; &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless&lt;/a&gt; &lt;em&gt;community.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>containers</category>
      <category>ecs</category>
      <category>fargate</category>
    </item>
    <item>
      <title>Amazon EventBridge: The Event-Driven Backbone of AWS (And My Favourite Service)</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Fri, 13 Mar 2026 01:29:07 +0000</pubDate>
      <link>https://forem.com/aws-builders/amazon-eventbridge-the-event-driven-backbone-of-aws-and-my-favourite-service-10j3</link>
      <guid>https://forem.com/aws-builders/amazon-eventbridge-the-event-driven-backbone-of-aws-and-my-favourite-service-10j3</guid>
      <description>&lt;p&gt;I have been building serverless applications on AWS for years now, and if you asked me to pick one service that keeps showing up in almost every project I build, it would be Amazon EventBridge without hesitation. EventBridge is my favourite AWS service. It offers rules for reacting to events, pipes for bridging data across AWS services, a scheduler for time-based triggers, and a schema registry for discoverability - all fully serverless and practically free.&lt;/p&gt;

&lt;p&gt;I recently had the opportunity to present on EventBridge to the &lt;a href="https://www.meetup.com/aws-cloud-club-at-amity-university-chhattisgarh/events/313431212/?eventOrigin=group_past_events" rel="noopener noreferrer"&gt;AWS Cloud Club at Amity University Chhattisgarh&lt;/a&gt;, and I decided to consolidate key details about this service into one place. This blog post is the companion to that presentation - a dive into why EventBridge matters, how I use it, and how you can start building with it today.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Event-Driven Architecture?
&lt;/h2&gt;

&lt;p&gt;Before we talk about EventBridge specifically, let's talk about why event-driven architecture (EDA) is worth adopting. Four core principles make EDA compelling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Loose coupling&lt;/strong&gt; - Producers emit events without knowing who consumes them. You can add new consumers without touching existing code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time reactivity&lt;/strong&gt; - Respond to changes as they happen, instead of polling for updates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability by default&lt;/strong&gt; - Event buses handle millions of events. Consumers scale independently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensibility&lt;/strong&gt; - New capabilities just subscribe to existing events. No rewiring required.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;EventBridge is purpose-built to make this simple, scalable, and serverless on AWS.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv330b09esd0382ppyuzs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv330b09esd0382ppyuzs.png" alt="Eventbridge/EDA Overview" width="800" height="541"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The pattern is straightforward. On the left, producers - S3, IAM, CloudWatch, your applications - emit events onto EventBridge. In the middle, EventBridge routes them via rules and content-based patterns. On the right, consumers - Lambda, SNS, SQS, Step Functions - each subscribe independently. Producers don't know about consumers. Consumers don't know about producers. EventBridge is the decoupling layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Amazon EventBridge?
&lt;/h2&gt;

&lt;p&gt;EventBridge is a serverless event bus that helps you receive, filter, transform, route, and deliver events. It has four major capabilities:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5yhni58oqrovu7rvhjxz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5yhni58oqrovu7rvhjxz.png" alt="EventBridge Capabilities" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Event Buses
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Default Event Bus&lt;/strong&gt; - Every AWS account gets one automatically. It receives virtually all mutating API calls via CloudTrail - anything that creates, modifies, or deletes something across 250+ AWS services. Events flow whether you are listening or not. And it is &lt;strong&gt;completely free&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom Event Buses&lt;/strong&gt; - Create your own buses and publish custom events. Cross-account sharing via resource policies. As of January 2025, EventBridge also supports &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/01/amazon-eventbridge-direct-delivery-cross-account-targets/" rel="noopener noreferrer"&gt;direct cross-account delivery&lt;/a&gt; to targets like SQS, Lambda, SNS, Kinesis, and API Gateway - no intermediary event bus required in the target account.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SaaS Partner Buses&lt;/strong&gt; - Partners like Datadog and PagerDuty can send events directly into your bus.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Rules &amp;amp; Event Patterns
&lt;/h3&gt;

&lt;p&gt;Rules match events and route them to targets. You can have up to 5 targets per rule. The pattern matching is remarkably flexible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prefix/suffix matching&lt;/strong&gt; - Match on the beginning or end of string values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anything-but&lt;/strong&gt; - Exclude specific values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Numeric ranges&lt;/strong&gt; - Filter by greater-than, less-than, or between&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exists&lt;/strong&gt; - Check for the presence or absence of a field&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OR logic&lt;/strong&gt; - Arrays match any value in the list&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Matching happens at the EventBridge level - you only pay for events that actually match your rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scheduler
&lt;/h3&gt;

&lt;p&gt;Three types of schedules: rate-based, cron expressions, and one-time schedules. Supports millions of schedules with integrations across 270+ AWS services and over 6,000 API operations. Includes flexible time windows, automatic retries, and dead-letter queues. EventBridge Scheduler replaces the older scheduled rules and is significantly more capable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pipes
&lt;/h3&gt;

&lt;p&gt;Pipes provide point-to-point integration with a pipeline architecture: &lt;strong&gt;Source → Filter → Enrich → Transform → Target&lt;/strong&gt;. They replace the "glue Lambda" functions that many of us have written to connect services. Filtered events are free.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixz363x3l8v9nwl3a7z8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixz363x3l8v9nwl3a7z8.png" alt="EventBridge Pipes" width="800" height="338"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example, a DynamoDB stream can flow into Pipes, filter for only INSERT events, enrich the data with a Lambda function, and deliver to an SQS queue - all without writing a custom Lambda to glue the pieces together.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Pattern: Security Monitoring with the Account Watcher
&lt;/h2&gt;

&lt;p&gt;The best way to understand EventBridge is to see it in action. My &lt;a href="https://darryl-ruggles.cloud/serverless-aws-account-watcher/" rel="noopener noreferrer"&gt;Serverless AWS Account Watcher&lt;/a&gt; project is a perfect example. The challenge was simple: keeping track of all the actions being done in your AWS account can be a real challenge, especially from a security perspective. I needed to know immediately when someone deleted an S3 bucket, created an IAM access key, or logged into the console.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5pgkniyt3mn6gax0y40v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5pgkniyt3mn6gax0y40v.png" alt="EventBridge Account Watcher" width="800" height="562"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The architecture relies entirely on the default event bus. CloudTrail captures API activity - S3 deletions, IAM changes, console logins - and delivers those events to EventBridge automatically. Rules match specific event patterns and trigger a Lambda function that formats notifications and fans them out to SNS (for email) and a Slack webhook.&lt;/p&gt;

&lt;p&gt;Here is what the SAM template looks like for the EventBridge rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;S3EventsRule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;EventBridgeRule&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;Pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;aws.s3&lt;/span&gt;
        &lt;span class="na"&gt;detail-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;API&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Call&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;via&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;CloudTrail"&lt;/span&gt;
        &lt;span class="na"&gt;detail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;eventName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DeleteBucket&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DeleteBucketPolicy&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;PutBucketPolicy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is it. A pattern and a source. SAM handles the rule creation, permissions, and target wiring. The Lambda handler uses Python's &lt;code&gt;match/case&lt;/code&gt; to route different event types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="n"&gt;event_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;case&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DeleteBucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;S3 bucket &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bucket_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; was deleted by &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;case&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CreateAccessKey&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New access key created for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;target_user&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; by &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;case&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ConsoleLogin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Console login by &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; from &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;source_ip&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last two lines of the handler are the entire fan-out - &lt;code&gt;send_slack_message()&lt;/code&gt; and &lt;code&gt;publish_to_sns()&lt;/code&gt;. Two notification channels, completely decoupled from the event detection logic.&lt;/p&gt;

&lt;p&gt;The extensibility story is powerful here. Want to monitor a new API action? Add one line to the event pattern. Want to add a Microsoft Teams notification? Add one more function call. The event detection, routing, and notification logic are all independent.&lt;/p&gt;

&lt;p&gt;I wrote about this project in detail in my &lt;a href="https://darryl-ruggles.cloud/serverless-aws-account-watcher/" rel="noopener noreferrer"&gt;Serverless AWS Account Watcher&lt;/a&gt; blog post - the full SAM template, Lambda code, and deployment instructions are all there.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Pattern: Health Check Alerts
&lt;/h2&gt;

&lt;p&gt;My &lt;a href="https://darryl-ruggles.cloud/serverless-site-health-check-notification-system/" rel="noopener noreferrer"&gt;Serverless Site Health Check Notification System&lt;/a&gt; takes the same EventBridge pattern in a different direction. Route53 health checks monitor endpoints globally via HTTP, HTTPS, or TCP. When a health check fails, CloudWatch transitions an alarm state, and that state change flows through the default event bus.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6aeqky0ghley6f1cl9o2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6aeqky0ghley6f1cl9o2.png" alt="EventBridge Health Check" width="800" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I took advantage of the suffix matching. The EventBridge rule matches alarm names ending with &lt;code&gt;-HealthCheckAlarm&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"aws.cloudwatch"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"detail-type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"CloudWatch Alarm State Change"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"detail"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"alarmName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"suffix"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"-HealthCheckAlarm"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means you can add new Route53 health checks without ever updating the EventBridge rule or redeploying your Lambda. Name the CloudWatch alarm with the &lt;code&gt;-HealthCheckAlarm&lt;/code&gt; suffix, and it automatically triggers the existing notification pipeline. This is the extensibility promise of EDA in practice - new health checks, no code changes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Pattern: Event-Driven Sales Analytics
&lt;/h2&gt;

&lt;p&gt;In my &lt;a href="https://darryl-ruggles.cloud/building-a-serverless-sales-analytics-platform-with-ai-insights-for-under-10month/" rel="noopener noreferrer"&gt;Serverless Sales Analytics Platform&lt;/a&gt;, EventBridge plays a dual role. First, S3 object creation events flow through the default bus - when a store uploads daily sales data, an EventBridge rule matches the bucket and prefix, triggering a Step Functions workflow to transform and analyze the data. Second, an EventBridge Scheduler rule runs at 11 PM daily as a fallback, ensuring the analysis pipeline executes even if stores have not reported.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhjj399wmmg2jb3zmnug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhjj399wmmg2jb3zmnug.png" alt="EventBridge Sales Analytics" width="800" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This combination of event-driven triggers and scheduled fallbacks is a pattern I use frequently. EventBridge handles both seamlessly - reactive processing when data arrives, and scheduled processing as a safety net.&lt;/p&gt;




&lt;h2&gt;
  
  
  EventBridge vs SNS vs SQS - When to Use What
&lt;/h2&gt;

&lt;p&gt;This is one of the most common questions I get. All three are messaging services, but they serve different purposes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;EventBridge&lt;/th&gt;
&lt;th&gt;SNS&lt;/th&gt;
&lt;th&gt;SQS&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;What it does&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Routes events based on content&lt;/td&gt;
&lt;td&gt;Broadcasts to subscribers&lt;/td&gt;
&lt;td&gt;Queues messages for consumers to pull&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Filtering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Prefix, suffix, numeric, anything-but, exists&lt;/td&gt;
&lt;td&gt;Basic attribute filters&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS service events&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;250+ services on default bus, free&lt;/td&gt;
&lt;td&gt;You wire it up yourself&lt;/td&gt;
&lt;td&gt;You wire it up yourself&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema discovery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Built-in Schema Registry&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Targets&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;20+ supported target types&lt;/td&gt;
&lt;td&gt;Lambda, SQS, HTTP, email subscriptions&lt;/td&gt;
&lt;td&gt;Consumer pulls at their own pace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ordering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Best-effort&lt;/td&gt;
&lt;td&gt;Best-effort&lt;/td&gt;
&lt;td&gt;FIFO available&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Use &lt;strong&gt;EventBridge&lt;/strong&gt; for intelligent content-based routing and reacting to AWS service events. Use &lt;strong&gt;SNS&lt;/strong&gt; for simple pub/sub fan-out when you already know the event structure. Use &lt;strong&gt;SQS&lt;/strong&gt; for guaranteed message processing with consumer-controlled throughput.&lt;/p&gt;

&lt;p&gt;In practice, they work together. My Account Watcher uses EventBridge for routing &lt;em&gt;and&lt;/em&gt; SNS for email delivery - each service doing what it does best.&lt;/p&gt;




&lt;h2&gt;
  
  
  EventBridge vs Kafka
&lt;/h2&gt;

&lt;p&gt;Another comparison that comes up: Kafka is a distributed log for high-throughput streaming with replay capabilities. EventBridge is a managed event router for AWS-native architectures. They are fundamentally different tools. Kafka excels at sustained high-throughput data streaming (millions of messages per second). EventBridge excels at reactive event routing with content-based filtering and zero infrastructure management.&lt;/p&gt;

&lt;p&gt;They can also work together - EventBridge Pipes can consume directly from Kafka topics.&lt;/p&gt;




&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;After building several production systems with EventBridge, here are the practices I have found most valuable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use specific event patterns&lt;/strong&gt; - The more precise your pattern, the fewer unnecessary Lambda invocations you pay for.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement dead-letter queues on rule targets&lt;/strong&gt; - When EventBridge cannot deliver to a target, the DLQ captures the event for investigation. Use Lambda Destinations OnFailure for richer error context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use input transformers&lt;/strong&gt; - Shape the event payload before it reaches your target. Reduce parsing logic in your Lambda functions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Separate buses for different domains&lt;/strong&gt; - Custom buses keep application events isolated from the AWS service events on the default bus.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prefer Scheduler over scheduled rules&lt;/strong&gt; - EventBridge Scheduler is the newer, more capable replacement. Use it for all new time-based triggers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Archive and Replay for debugging&lt;/strong&gt; - Enable archiving so you can replay events after fixing a bug. This is invaluable in production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement idempotency&lt;/strong&gt; - EventBridge provides at-least-once delivery for most targets. Your consumers should handle duplicate events gracefully.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use SAM, Terraform, or CDK for infrastructure&lt;/strong&gt; - Define your rules, patterns, and targets as code. The SAM &lt;code&gt;EventBridgeRule&lt;/code&gt; event type is incredibly concise.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Pricing - Practically Free
&lt;/h2&gt;

&lt;p&gt;This might be the most surprising part for people new to EventBridge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Default event bus&lt;/strong&gt;: Completely &lt;strong&gt;free&lt;/strong&gt;. Every AWS service event that flows through it costs you nothing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom events&lt;/strong&gt;: $1.00 per million events. Each 64 KB chunk of a payload counts as one event.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduler&lt;/strong&gt;: $1.00 per million invocations, with a generous free tier of 14 million invocations per month.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipes&lt;/strong&gt;: $0.40 per million requests after filtering. Filtered-out events are not charged.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For my projects - Account Watcher, Health Check Alerts, Sales Analytics - the EventBridge cost has been essentially $0. The Lambda invocations and SNS messages that EventBridge triggers are also well within free tier for low-to-moderate volume alerting.&lt;/p&gt;




&lt;h2&gt;
  
  
  Things to Know
&lt;/h2&gt;

&lt;p&gt;A few operational details worth keeping in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;At-least-once delivery&lt;/strong&gt; - EventBridge delivers events at least once for most targets, and AWS service events are delivered with either "best effort" or "durable" guarantees depending on the service. Design your consumers to be idempotent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 targets per rule&lt;/strong&gt; - Need more? Create additional rules with the same pattern.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1 MB event size limit&lt;/strong&gt; - As of January 2026, EventBridge &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/01/amazon-eventbridge-increases-event-payload-size-256-kb-1-mb/" rel="noopener noreferrer"&gt;increased the payload limit from 256 KB to 1 MB&lt;/a&gt;. For larger payloads, put the data in S3 and pass the reference in the event. Note that CloudTrail events are still capped at 256 KB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sub-second latency&lt;/strong&gt; - EventBridge routes events quickly, but CloudTrail-sourced events have a 1–5 minute delay from the original API call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No ordering guarantee&lt;/strong&gt; - The default bus does not guarantee event order. Handle this in your consumer logic.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The EDA Trade-offs - Being Honest
&lt;/h2&gt;

&lt;p&gt;I believe in being transparent about trade-offs rather than only presenting the benefits. Event-driven architecture is not universally better than request/response - it is a different set of trade-offs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Debugging is harder.&lt;/strong&gt; In request/response, you get a stack trace from top to bottom. In EDA, an event flows through multiple decoupled services - when something fails, you are piecing together a distributed trace across Lambda logs, EventBridge metrics, and DLQ messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Eventual consistency, not immediate.&lt;/strong&gt; If a user creates a resource and your event handler has not processed yet, a subsequent read might show stale data. You need to design your UX to tolerate this lag.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testing is more complex.&lt;/strong&gt; You cannot just call an endpoint and check the response. You need to verify that events were emitted, rules matched, targets invoked, and downstream side effects occurred.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Invisible" coupling.&lt;/strong&gt; EDA is loosely coupled at the code level, but you still have coupling on the event schema. If a producer changes their event format, consumers break silently. Schema Registry helps, but it requires discipline.&lt;/p&gt;

&lt;p&gt;The good framing: use request/response when you need synchronous answers and simple debugging. Use EDA when you need loose coupling, scalability, extensibility, and real-time reactivity. Most real systems use both.&lt;/p&gt;




&lt;h2&gt;
  
  
  Recent Updates Worth Knowing
&lt;/h2&gt;

&lt;p&gt;EventBridge continues to evolve. Here are the most notable changes from the past year:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scheduler quota monitoring&lt;/strong&gt; (Feb 2026) - EventBridge Scheduler now &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/02/amazon-eventbridge-scheduler-resource-metrics/" rel="noopener noreferrer"&gt;emits resource count metrics&lt;/a&gt; to CloudWatch, so you can monitor how close you are to quota limits and request increases proactively.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1 MB event payload&lt;/strong&gt; (Jan 2026) - The event size limit &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/01/amazon-eventbridge-increases-event-payload-size-256-kb-1-mb/" rel="noopener noreferrer"&gt;increased from 256 KB to 1 MB&lt;/a&gt;, eliminating the need for data chunking or S3 offloading in most cases. Particularly useful for LLM prompts, telemetry signals, and complex ML output payloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced visual rule builder&lt;/strong&gt; (Nov 2025) - The console now includes an &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/eventbridge-enhanced-visual-rule-builder/" rel="noopener noreferrer"&gt;intuitive visual rule builder&lt;/a&gt; with a comprehensive event catalog and schema-aware drag-and-drop canvas. Makes it much easier to discover available events from 250+ AWS services and build patterns without referencing individual service docs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQS fair queue targets&lt;/strong&gt; (Nov 2025) - EventBridge can now &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/amazon-eventbridge-sqs-fair-queue-targets/" rel="noopener noreferrer"&gt;target SQS fair queues&lt;/a&gt;, enabling fairer message distribution across consumer groups in multi-tenant systems.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;EventBridge sits at the center of event-driven architecture on AWS. The default event bus gives you free, automatic visibility into virtually every mutating API call across your account. Rules and patterns give you precise, content-based routing. Scheduler replaces cron jobs with a fully managed, scalable solution. Pipes eliminate glue code for point-to-point integrations.&lt;/p&gt;

&lt;p&gt;I have used it for &lt;a href="https://darryl-ruggles.cloud/serverless-aws-account-watcher/" rel="noopener noreferrer"&gt;security monitoring&lt;/a&gt;, &lt;a href="https://darryl-ruggles.cloud/serverless-site-health-check-notification-system/" rel="noopener noreferrer"&gt;health check alerting&lt;/a&gt;, and &lt;a href="https://darryl-ruggles.cloud/building-a-serverless-sales-analytics-platform-with-ai-insights-for-under-10month/" rel="noopener noreferrer"&gt;data pipeline orchestration&lt;/a&gt; - and every time, EventBridge has been the simplest, most cost-effective piece of the architecture. It is the service I reach for first when I need to connect systems, react to changes, or automate workflows. If you are building on AWS and have not explored EventBridge yet, start with the default event bus. Write one rule. Match one event. You will be hooked.&lt;/p&gt;

&lt;p&gt;If you are interested in going deeper, I recently presented this material to the &lt;a href="https://www.meetup.com/aws-cloud-club-at-amity-university-chhattisgarh/events/313431212/?eventOrigin=group_past_events" rel="noopener noreferrer"&gt;AWS Cloud Club at Amity University Chhattisgarh&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/eventbridge/" rel="noopener noreferrer"&gt;Amazon EventBridge Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/serverless-aws-account-watcher/" rel="noopener noreferrer"&gt;Serverless AWS Account Watcher&lt;/a&gt; - My blog post on security monitoring with EventBridge&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/serverless-site-health-check-notification-system/" rel="noopener noreferrer"&gt;Serverless Site Health Check Notification System&lt;/a&gt; - Health check alerting with EventBridge suffix matching&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/building-a-serverless-sales-analytics-platform-with-ai-insights-for-under-10month/" rel="noopener noreferrer"&gt;Building a Serverless Sales Analytics Platform&lt;/a&gt; - EventBridge event routing + scheduled triggers&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/RDarrylR/serverless-account-watcher" rel="noopener noreferrer"&gt;Account Watcher GitHub Repo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/RDarrylR/route53-health-check-sam" rel="noopener noreferrer"&gt;Health Check GitHub Repo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.meetup.com/aws-cloud-club-at-amity-university-chhattisgarh/events/313431212/?eventOrigin=group_past_events" rel="noopener noreferrer"&gt;AWS Cloud Club at Amity University Chhattisgarh - EventBridge Presentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.meetup.com/aws-london-on-user-group/" rel="noopener noreferrer"&gt;AWS London Ontario User Group&lt;/a&gt; - Meetups, talks, and community for AWS builders in the London, Ontario area&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/@AWSLondonONMeetup" rel="noopener noreferrer"&gt;AWS London Ontario User Group YouTube&lt;/a&gt; - Recorded talks and presentations&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Connect with me on&lt;/em&gt; &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://bsky.app/profile/darrylruggles.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;&lt;em&gt;, or the&lt;/em&gt; &lt;a href="https://community.aws/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;&lt;em&gt;. Check out more of my projects at&lt;/em&gt; &lt;a href="https://darryl-ruggles.cloud" rel="noopener noreferrer"&gt;darryl-ruggles.cloud&lt;/a&gt; &lt;em&gt;and join the&lt;/em&gt; &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless&lt;/a&gt; &lt;em&gt;community.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>eventdriven</category>
    </item>
    <item>
      <title>Building a Serverless Blogging Platform with AWS and Claude Code</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Wed, 11 Mar 2026 01:10:10 +0000</pubDate>
      <link>https://forem.com/aws-builders/building-a-serverless-blogging-platform-with-aws-and-claude-code-1aka</link>
      <guid>https://forem.com/aws-builders/building-a-serverless-blogging-platform-with-aws-and-claude-code-1aka</guid>
      <description>&lt;p&gt;&lt;em&gt;How I replaced Hashnode with a custom-built, full-featured blogging platform in under 10 hours - with an AI pair programmer doing the heavy lifting.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can see the new site here -&amp;gt; &lt;a href="https://darryl-ruggles.cloud/" rel="noopener noreferrer"&gt;Darryl's World of Cloud&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Left My Old Platform
&lt;/h2&gt;

&lt;p&gt;For the past couple of years, I hosted my blog on Hashnode with a Vercel-backed custom domain. It worked. Articles rendered fine, the editor was decent, and I didn't have to think about infrastructure. But over time, the limitations started adding up.&lt;/p&gt;

&lt;p&gt;Newsletters were the first frustration. Setting up a mailing list, customizing email templates, and scheduling sends all required workarounds or third-party integrations. Email handling felt like an afterthought bolted onto a developer blogging platform. I wanted to send article roundups, curated link collections, and freeform newsletters - all from the same place I write articles. There wasn't an easy way to make all that work there.&lt;/p&gt;

&lt;p&gt;Then there was the visibility problem. I am a cloud solutions architect. I work with AWS infrastructure every day. Yet my own blog was a black box. I couldn't see the CDN configuration, couldn't tune caching behavior, couldn't add custom security headers or WAF rules. The hosting details were hidden behind an abstraction I didn't control.&lt;/p&gt;

&lt;p&gt;I also wanted to learn. I had been working with several AWS services professionally but hadn't built a complete, production-facing application for my own use that stitched them all together - CloudFront distributions, Lambda function URLs, DynamoDB single-table design, SES email delivery, Cognito authentication. Building my own platform and managing it over time is the best way to learn these services deeply and have something concrete to write about.&lt;/p&gt;

&lt;p&gt;So I decided to build it myself. Not just migrate the content - build the entire platform from scratch. The editor, the publishing pipeline, the newsletter system, the analytics, the infrastructure. All of it. The question was whether I could do it fast enough to make the effort worthwhile.&lt;/p&gt;

&lt;p&gt;The final push to actually work on this project came when I read a recent article from &lt;a href="https://www.linkedin.com/in/ranbuilder/" rel="noopener noreferrer"&gt;Ran Isenberg&lt;/a&gt; titled &lt;a href="https://www.ranthebuilder.cloud/blog/claude-built-my-wix-website-in-3-hours-is-saas-dead/" rel="noopener noreferrer"&gt;Claude Built My Wix Website in 3 Hours - Is SaaS Dead?&lt;/a&gt;, where he went through a similar process and I knew it was time to start myself. I have learned so much from all of his work and am grateful for the motivation here.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built - A Modern, Full-Featured Platform
&lt;/h2&gt;

&lt;p&gt;The result is a modern, visually clean blogging platform that loads fast. Every public page is pre-rendered static HTML served from CloudFront edge locations worldwide. No server-side rendering, no client-side hydration, no JavaScript frameworks on the public site - just HTML and CSS delivered from the nearest edge location. Page loads are sub-second globally.&lt;/p&gt;

&lt;p&gt;Compared to what I had before, the design is more polished and the performance is noticeably snappier. I went with a clean, card-based homepage layout with a hero section, tag-based navigation, and a reading experience that stays out of your way. The typography, spacing, and code syntax highlighting are all tuned exactly how I want them - because I own every pixel. And if I ever want to change anything it's only a few minutes away from being live.&lt;/p&gt;

&lt;p&gt;But the real wins are in the features I could never get working the way I wanted on a hosted platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  Newsletter Infrastructure Done Right
&lt;/h3&gt;

&lt;p&gt;I haven't sent out any newsletters publicly as of yet but I have done lots of testing and iterating on the format and the infrastructure is now in place. The newsletter system supports three distinct types. &lt;strong&gt;Freeform newsletters&lt;/strong&gt; use the same markdown editor as articles - write whatever you want and send it. &lt;strong&gt;Article roundups&lt;/strong&gt; let you select published articles and automatically generate card-based layouts with excerpts and links. &lt;strong&gt;Curated link collections&lt;/strong&gt; pull in external URLs, auto-fetch their metadata (title, description, image) via OpenGraph tags with Bedrock AI as a fallback, and render them as rich link cards.&lt;/p&gt;

&lt;p&gt;All three types support scheduling, email preview before send, and are archived as browsable pages on the public site. Everything runs through Amazon SES - no Mailchimp, no SendGrid, no third-party email service.&lt;/p&gt;

&lt;h3&gt;
  
  
  Article Preview Before Publishing
&lt;/h3&gt;

&lt;p&gt;Time-limited private URLs let you see exactly how a draft will look on the live site before making it public with the same template, same styles, and same layout as the real articles have. Each preview link expires in a short time, so you can share drafts with reviewers without worrying about stale links lingering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Comments with Reader Registration
&lt;/h3&gt;

&lt;p&gt;Readers sign up to be able to add article comments. They can verify their email and once approved by me they can comment on posts. The admin side has a full moderation queue - approve, reject, or ban users. No third-party comment widgets are used or injecting tracking scripts into your site.&lt;/p&gt;

&lt;h3&gt;
  
  
  Email Subscriptions and Notifications
&lt;/h3&gt;

&lt;p&gt;Readers subscribe with double opt-in email verification, manage their preferences, and get notified automatically when new articles are published. One-click unsubscribe in every email. All of it is built on SES and DynamoDB.&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom Analytics
&lt;/h3&gt;

&lt;p&gt;Built-in page view tracking with zero third-party scripts. I can see referral sources, geographic distribution, device types, and more about who browses articles on the site - with hourly, daily, and monthly granularity. I would have had to pay a lot more per month to see this with Vercel but here it's almost free.&lt;/p&gt;

&lt;h3&gt;
  
  
  Developer Tooling and Infrastructure
&lt;/h3&gt;

&lt;p&gt;A WYSIWYG markdown editor with source toggle, AI-powered SEO title and description generation via Amazon Bedrock, drag-and-drop image upload to S3 with pre-signed URLs, automated backups with cross-account disaster recovery, and CI/CD with GitHub Actions deploying on every push to main.&lt;/p&gt;

&lt;p&gt;The platform runs on 16+ AWS services including CloudFront, Lambda, DynamoDB, S3, SES, Cognito, and more with automated backups and lots of security controls. My favorite Infrastructure as Code (IaC) tool Terraform is used. My monthly cost is in the single dollars due to using serverless/managed services with pay-per-use pricing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI-Assisted Development Experience
&lt;/h2&gt;

&lt;p&gt;I have worked in the AI/ML space for a number of years now. I have seen lots of promising ideas but haven't been completely sold on most of the tools from this space being super useful or becoming real difference makers in my day-to-day life. I have been experimenting with AI-based coding tools for over a year now with lots of hope. I was skeptical and not super impressed with what they could do a year or more ago. I had tried GitHub Copilot back then and found the autocomplete suggestions useful but hardly earth shattering. I experimented with &lt;a href="https://cursor.com/" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, &lt;a href="https://kiro.dev" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt;, &lt;a href="https://opencode.ai" rel="noopener noreferrer"&gt;OpenCode&lt;/a&gt;, and a few others. The early outputs were rough - code that looked plausible but missed edge cases, made incorrect API calls, or ignored the architectural context of the project.&lt;/p&gt;

&lt;p&gt;But the improvement over the last number of months has been dramatic. I settled on &lt;a href="https://claude.com/product/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; with Opus 4.5/4.6 as my primary development tool, and the workflow it enables is genuinely different from anything I have experienced in 25+ years of software development.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Workflow
&lt;/h3&gt;

&lt;p&gt;The pattern is simple: spend a few minutes describing what you want in English, and within minutes the feature is working. Not hours of scaffolding and boilerplate - minutes. Need a newsletter scheduling system with EventBridge? Describe the requirements, review the generated Terraform and Lambda code, test it, push it. Need a comment moderation queue with Cognito integration? Same thing. Of course there are caveats and things to seriously keep an eye on - more on those below.&lt;/p&gt;

&lt;p&gt;The key is that you are not just generating code snippets. Claude Code understands the full project context - the file structure, the existing patterns, the infrastructure configuration, the naming conventions. When I asked it to add newsletter support, it knew where the API routes lived, how the DynamoDB tables were structured, what the Terraform modules looked like, and how the admin SPA was organized. It generated coordinated changes across a dozen files that all fit together coherently.&lt;/p&gt;

&lt;p&gt;This is what sets it apart from autocomplete-style tools. The context window spans the entire project. It reads your Terraform state, your Lambda handlers, your React components, and your CLAUDE.md instructions. The result is code that feels like it was written by someone who has been on the project for months, not a tool that just saw a single file.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP Servers Extended the Capabilities
&lt;/h3&gt;

&lt;p&gt;One of the features that made Claude Code especially effective for most projects is Model Context Protocol (MCP) servers. These are plugins that give the AI access to specialized tools and documentation right in the development flow. Some of the ones I use a lot are described here.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Terraform MCP server&lt;/strong&gt; provides instant access to AWS provider documentation, Checkov security scanning, and module search. Instead of tab-switching to the Terraform docs, Claude Code looks up resource attributes, checks for security misconfigurations, and finds community modules - all inline during development.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;AWS Knowledge MCP server&lt;/strong&gt; gives inline access to AWS documentation. When I needed to understand CloudFront OAC signing behavior or SES configuration set options, the documentation was available without leaving the editor.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;AWS Serverless MCP server&lt;/strong&gt; provides Lambda patterns, SAM guidance, and event source mapping configuration. Useful for getting the Lambda function URL streaming configuration right and understanding best practices.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;AWS Diagram MCP server&lt;/strong&gt; generates architecture diagrams directly from code, producing decent diagrams with little effort. I have to say the diagram server still has room for improvement as there are too many overlapping lines and it tends to generate very vertical diagrams.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Timeline
&lt;/h3&gt;

&lt;p&gt;Looking at the git history, the build of my new site happened by working on it for a few hours a day over five calendar days:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Day 1&lt;/strong&gt;: Core platform from scratch plus Hashnode migration - a couple of hours to get articles rendering, the admin editor working, and all existing content imported&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Day 2&lt;/strong&gt;: Comments, email subscriptions, CloudWatch monitoring, Slack alerts, CI/CD pipeline, and automated testing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Day 3&lt;/strong&gt;: Analytics refinements, backup setup, custom domain setup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Day 4&lt;/strong&gt;: Full newsletter system - three newsletter types, scheduling, preview, archive pages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Day 5&lt;/strong&gt;: Article preview with random URLs, firewall tuning, polish and bug fixes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The final count: 70 commits, 11+ major features in five days. &lt;strong&gt;Total hands-on development time: under 10 hours&lt;/strong&gt; for a full-featured, production-quality blogging platform. &lt;strong&gt;Monthly cost: well under $10&lt;/strong&gt; - serverless pay-per-use means you only pay for what you use, and a personal blog's traffic keeps costs negligible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Assessment - Not Magic, But Powerful
&lt;/h2&gt;

&lt;p&gt;I want to be clear about something: Claude Code makes mistakes. Often. It's not a magic box that produces perfect code on the first try. It's not a tool to hand to your New Grad developer and expect them to produce quality code that's secure and meets the design specs. You still need experienced developers to guide the process and push back when things are going down the wrong path.&lt;/p&gt;

&lt;p&gt;It will occasionally hallucinate API parameters that do not exist. It will sometimes choose an architectural pattern that's technically correct but wrong for the specific context. It will miss edge cases that an experienced developer would catch immediately. It will sometimes try to solve a problem by adding complexity when the right answer is to simplify. It will use the wrong versions of tools and many other bad things.&lt;/p&gt;

&lt;p&gt;But here is the thing - an experienced developer who knows the services, languages, and patterns can catch these mistakes and push back. The AI responds well to correction. You say "that parameter does not exist on this resource," and it fixes it. You say "this should be async, not synchronous," and it restructures the code. The collaboration works because the human brings judgment and the AI brings speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Human Judgment Was Critical
&lt;/h3&gt;

&lt;p&gt;Several times during the build, my experience was the difference between a working system and a broken one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security architecture decisions&lt;/strong&gt; required human judgment at every turn. IAM least privilege policies, backup tool hardening, encryption key rotation, SES sending authorization - these are areas where "works on my machine" is not good enough. Each security decision needed careful review against best practices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synchronous vs. asynchronous patterns&lt;/strong&gt; were a recurring design choice. When should code work synchronously vs. asynchronously? When should newsletter sends be immediate vs. scheduled through EventBridge? These are architectural decisions that depend on user experience requirements and operational constraints, not just technical feasibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Review Layer
&lt;/h3&gt;

&lt;p&gt;Code generated by AI must be scrutinized with the same rigor as code written by a junior developer - maybe more. For this project, every pull request went through GitHub Copilot and Amazon Q Developer code reviews in addition to my own review. This layered approach caught issues that any single reviewer might miss.&lt;/p&gt;

&lt;p&gt;For anything production-facing, especially with security implications, there's no substitute for thorough review, penetration testing, and proper validation. AI-assisted development is fast, but speed without scrutiny is a liability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI-assisted development is real and practical today.&lt;/strong&gt; Not for every task, and not without supervision, but the productivity gains are substantial for developers who know their domain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP servers are a force multiplier.&lt;/strong&gt; Having documentation, security scanning, and infrastructure tools available inline eliminates context switching and keeps you in flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serverless is ideal for personal projects.&lt;/strong&gt; Pay-per-use pricing means your blog costs pennies when nobody is reading it and scales automatically when a post hits the front page.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The human in the loop matters more than the AI.&lt;/strong&gt; Domain expertise, security awareness, and architectural judgment are what turn AI-generated code into production-quality software.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build things you will actually use.&lt;/strong&gt; The best way to learn anything is to build something real that you will maintain and improve over time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;I was lucky enough to be at AWS re:Invent 2025 in Las Vegas and spoke with many people on the topic of AI technology and the impact on our industry. There is a lot to be concerned about but there is so much promise. I was in the room for the final keynote from &lt;a href="https://www.linkedin.com/in/wernervogels/" rel="noopener noreferrer"&gt;Werner Vogels&lt;/a&gt; and was truly inspired by his talk about the &lt;a href="https://thekernel.news/articles/dawn-of-the-renaissance-developer/" rel="noopener noreferrer"&gt;The Dawn of the Renaissance Developer&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I haven't had near this much fun coding in years. There is something deeply satisfying about describing a feature in plain English, watching it materialize in code, reviewing and refining it, and then seeing it live on your own infrastructure minutes later. The feedback loop is extraordinarily tight.&lt;/p&gt;

&lt;p&gt;My backlog of features to build grows faster than I can ship them. Dark mode, reading time estimates, related articles, full-text search, RSS improvements, an about page redesign - the list keeps growing. Every time I use the platform to write a post, I notice something I want to improve. That's a good sign. It means the platform is useful enough to invest in, and I care enough about it to keep iterating.&lt;/p&gt;

&lt;p&gt;If you have been putting off a side project because you thought it would take too long, the barrier has never been lower. A working prototype of almost anything is now a weekend away. The tools are that good - if you bring the expertise to guide them.&lt;/p&gt;

&lt;p&gt;This platform is live and serving the very post you are reading right now. If something looks off, that's on me - and probably on my backlog already.&lt;/p&gt;

&lt;p&gt;IMPORTANT: Remember to always validate security practices for anything public-facing - especially anything for production use. AI-assisted development makes it easy to move fast, and moving fast without security review is how breaches happen. Take the time to review IAM policies, test authentication flows, scan for vulnerabilities, and audit your infrastructure. The speed gains from AI should buy you more time for security - not less.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Connect with me on&lt;/em&gt; &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://bsky.app/profile/darryl-ruggles.cloud" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://github.com/RDarrylR/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;em&gt;, or the&lt;/em&gt; &lt;a href="https://builder.aws.com/community/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;&lt;em&gt;. Check out more of my projects at&lt;/em&gt; &lt;a href="https://darryl-ruggles.cloud" rel="noopener noreferrer"&gt;darryl-ruggles.cloud&lt;/a&gt; &lt;em&gt;and join the&lt;/em&gt; &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless&lt;/a&gt; &lt;em&gt;community.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>claudecode</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Bi-directional Voice-Controlled Recipe Assistant with Nova Sonic 2</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Sun, 08 Mar 2026 01:19:36 +0000</pubDate>
      <link>https://forem.com/aws-builders/bi-directional-voice-controlled-recipe-assistant-with-nova-sonic-v2-4p59</link>
      <guid>https://forem.com/aws-builders/bi-directional-voice-controlled-recipe-assistant-with-nova-sonic-v2-4p59</guid>
      <description>&lt;p&gt;What if your recipe assistant could talk back?&lt;/p&gt;

&lt;p&gt;I have been building a &lt;a href="https://darryl-ruggles.cloud/serverless-recipe-assistant-with-agentcore-and-strands/" rel="noopener noreferrer"&gt;serverless Family Recipe Assistant&lt;/a&gt; that searches my family's recipe collection, calculates nutrition from USDA data, and handles multi-turn conversations through a web UI. It works well. But every time I am in the kitchen with flour on my hands, reaching for my phone to type "how long do I bake the banana bread?" feels wrong.&lt;/p&gt;

&lt;p&gt;I wanted to just &lt;em&gt;ask&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The text-based assistant already had a "cooking mode" that read recipes aloud using Amazon Polly. But listening to a long recipe read start-to-finish by a TTS voice is surprisingly tedious - you cannot ask it to slow down, skip ahead, or clarify a step without going back to the screen and typing. What I really wanted was a conversation: "What is the next step?" or "How much butter was that again?" while my hands are covered in dough.&lt;/p&gt;

&lt;p&gt;Amazon Nova Sonic v2 launched recently with sub-700ms speech-to-speech latency and a 1M token context window. The Strands Agents SDK added experimental support for bidirectional streaming through a &lt;code&gt;BidiAgent&lt;/code&gt; class that wraps the WebSocket complexity into something remarkably simple. I decided to wire the two together and add my existing recipe tools - search, timers, nutrition lookup, and unit conversion - to build a voice-controlled kitchen assistant.&lt;/p&gt;

&lt;p&gt;This post walks through how I built it, what worked, and what surprised me. It works great on my laptop and even on my Android phone and IPad. The complete code is on GitHub: &lt;a href="https://github.com/RDarrylR/serverless-family-recipes-bidirectional-nova-sonic" rel="noopener noreferrer"&gt;github.com/RDarrylR/serverless-family-recipes-bidirectional-nova-sonic&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Bidirectional Streaming?
&lt;/h2&gt;

&lt;p&gt;Traditional voice assistants follow a rigid pattern: listen, transcribe, think, generate text, synthesize speech, play audio. Each step waits for the previous one to finish. The result is a noticeable pause between your question and the response.&lt;/p&gt;

&lt;p&gt;Bidirectional streaming changes this. The agent maintains a persistent WebSocket connection where audio flows in both directions simultaneously. Nova Sonic v2 handles speech understanding and generation in a single model - there is no separate transcription or TTS step. The practical effect is that the agent starts responding while you are still finishing your sentence, and you can interrupt it mid-answer to course-correct.&lt;/p&gt;

&lt;p&gt;If you have used a native WebSocket implementation before, you know this means managing connection lifecycle, audio encoding, event multiplexing, and error recovery. That is roughly 150 lines of plumbing code before you write any business logic.&lt;/p&gt;

&lt;p&gt;Strands &lt;code&gt;BidiAgent&lt;/code&gt; reduces this to about 20 lines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The system has two layers. A React frontend in the browser captures microphone audio using the Web Audio API (with built-in echo cancellation - no headset needed) and sends it over a WebSocket to a FastAPI server. The server runs a Strands BidiAgent that forwards audio to Nova Sonic v2, dispatches tool calls, and streams response audio back to the browser for playback.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vq0wc50cvicfvp1rsia.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vq0wc50cvicfvp1rsia.png" alt="Architecture Diagram" width="800" height="634"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BidiAgent&lt;/strong&gt; - The Strands orchestrator. Manages the streaming session, routes audio to the model, and dispatches tool calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BidiNovaSonicModel&lt;/strong&gt; - The model adapter. Establishes a bidirectional WebSocket to Bedrock and handles the Nova Sonic protocol.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kitchen tools&lt;/strong&gt; - Four &lt;code&gt;@tool&lt;/code&gt; decorated Python functions that the agent can call during conversation without interrupting the audio stream.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bedrock Knowledge Base&lt;/strong&gt; - My existing recipe collection, indexed with Titan Embed V2 and stored in S3 Vectors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;USDA FoodData Central&lt;/strong&gt; - Public API for nutrition data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key insight is that tool execution happens &lt;em&gt;concurrently&lt;/em&gt; with audio streaming. When you ask "find me a pasta recipe," the agent calls the &lt;code&gt;search_recipes&lt;/code&gt; tool while continuing to listen for follow-up input. No blocking, no silence gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before you start, you will need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.13+ (3.12 minimum for Nova Sonic)&lt;/li&gt;
&lt;li&gt;Node.js 18+ (for the Vite frontend dev server)&lt;/li&gt;
&lt;li&gt;An AWS account with Bedrock model access enabled for Nova Sonic v2&lt;/li&gt;
&lt;li&gt;PortAudio system library (&lt;code&gt;brew install portaudio&lt;/code&gt; on macOS) - required as a transitive dependency of the Strands SDK&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;uv&lt;/code&gt; for Python dependency management
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/RDarrylR/serverless-family-recipes-bidirectional-nova-sonic.git
&lt;span class="nb"&gt;cd &lt;/span&gt;serverless-family-recipes-bidirectional-nova-sonic
uv &lt;span class="nb"&gt;sync
&lt;/span&gt;make install-frontend
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Building the Agent
&lt;/h2&gt;

&lt;p&gt;The server is a FastAPI WebSocket endpoint that bridges browser audio to BidiAgent. The key insight is that BidiAgent accepts plain callables for I/O. At its simplest, you can pass &lt;code&gt;ws.receive_json&lt;/code&gt; and &lt;code&gt;ws.send_json&lt;/code&gt; directly. In practice, I wrap them with thin functions for message size limits and cost tracking, but the pattern stays the same - no custom I/O classes needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WebSocketDisconnect&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.experimental.bidi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BidiAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.experimental.bidi.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BidiNovaSonicModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.experimental.bidi.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stop_conversation&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;NOVA_SONIC_VOICE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;search_recipes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;set_timer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nutrition_lookup&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;convert_units&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;sonic_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BidiNovaSonicModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;provider_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;16000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;24000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;NOVA_SONIC_VOICE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.websocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/ws&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;websocket_endpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BidiAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sonic_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_recipes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;set_timer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nutrition_lookup&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;convert_units&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stop_conversation&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;accept&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;receive_json&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;send_json&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;WebSocketDisconnect&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Client disconnected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the core of the agent. A new &lt;code&gt;BidiAgent&lt;/code&gt; is created per connection - no shared state between sessions. The &lt;code&gt;BidiNovaSonicModel&lt;/code&gt; handles the WebSocket connection to Bedrock. The &lt;code&gt;agent.run()&lt;/code&gt; method ties them together and runs until the user says "stop" (triggering the &lt;code&gt;stop_conversation&lt;/code&gt; tool) or the browser disconnects.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;provider_config&lt;/code&gt; sets the audio sample rates (16kHz input from the browser, 24kHz output from Nova Sonic) and the voice. Nova Sonic v2 supports several voices - I went with "tiffany" for a natural conversational tone, but you can also use "amy" or "puck".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Voice IDs must be lowercase. Passing &lt;code&gt;"Tiffany"&lt;/code&gt; instead of &lt;code&gt;"tiffany"&lt;/code&gt; results in a &lt;code&gt;ValidationException: Received invalid id&lt;/code&gt; error that silently kills the session. This is not documented anywhere obvious - I discovered it by enabling debug logging on the Strands Nova Sonic model class.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why &lt;code&gt;stop_conversation&lt;/code&gt;?
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;stop_conversation&lt;/code&gt; tool is a built-in Strands tool that gracefully shuts down the streaming session. Without it, the only way to stop the agent is Ctrl+C, which does not cleanly close the WebSocket. The system prompt tells the agent to use this tool when users say "goodbye" or "stop."&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Kitchen Tools
&lt;/h2&gt;

&lt;p&gt;Tools work identically to standard Strands agents. You decorate a function with &lt;code&gt;@tool&lt;/code&gt;, add a docstring that describes when to use it, and pass it to the agent. The model decides when to call tools based on the conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recipe Search
&lt;/h3&gt;

&lt;p&gt;This tool connects to my existing Bedrock Knowledge Base, which contains my family's recipe collection indexed with Titan Embed V2.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BEDROCK_KB_ID&lt;/span&gt;

&lt;span class="n"&gt;bedrock_agent_runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-agent-runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_recipes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the recipe knowledge base for recipes matching the query.

    Use this tool whenever a user asks about recipes, ingredients, or cooking methods.

    Args:
        query: Natural language search query about recipes
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock_agent_runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;knowledgeBaseId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BEDROCK_KB_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;retrievalQuery&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;retrievalConfiguration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vectorSearchConfiguration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;numberOfResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrievalResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No recipes found matching that query.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Group chunks by source file so multi-chunk recipes are reassembled
&lt;/span&gt;    &lt;span class="n"&gt;source_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3Location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uri&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;source_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;source_chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;source_chunks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;source_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;texts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]}&lt;/span&gt;
        &lt;span class="n"&gt;source_chunks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;texts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;source_chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;merged&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;texts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Recipe: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;merged&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; recipe(s):&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The chunk-merging logic is important. Bedrock KB returns chunks, not complete documents. A long recipe might be split across multiple chunks. By grouping chunks that share the same S3 source URI, the tool reassembles the full recipe text before handing it to the model.&lt;/p&gt;

&lt;p&gt;The score threshold of 0.3 filters out weak matches. Without it, the agent would sometimes confidently describe a recipe that was only tangentially related to the query.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cooking Timer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_timer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cooking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Set a cooking timer for the specified number of minutes.

    Args:
        minutes: Number of minutes for the timer
        label: A short description of what the timer is for
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;minutes&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Timer must be at least 1 minute.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;loop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_running_loop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;_timer_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Timer set: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; minutes. I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ll let you know when it&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s done.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_timer_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;** TIMER DONE: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; minutes) **&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The timer runs as an &lt;code&gt;asyncio&lt;/code&gt; background task. When it expires, it prints to the console. In a production version, this would trigger a spoken notification through the audio output - but since Nova Sonic sessions have an 8-minute limit, long timers outlive the session anyway. I will address this in the trade-offs section.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nutrition Lookup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;nutrition_lookup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;food_item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Look up nutrition information for a food item using USDA FoodData Central.

    Args:
        food_item: The food item to look up, e.g. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chicken breast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brown rice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.nal.usda.gov/fdc/v1/foods/search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;food_item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pageSize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;USDA_API_KEY&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;foods&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;foods&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;foods&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No nutrition data found for &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;food_item&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;food&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;foods&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract key nutrients from the USDA response
&lt;/span&gt;    &lt;span class="n"&gt;nutrients&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;food&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;foodNutrients&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nutrientName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;TARGET_NUTRIENTS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;nutrients&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TARGET_NUTRIENTS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unitName&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Nutrition for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;food&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (per 100g):&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nutrients&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The USDA FoodData Central API is free and does not require registration for the &lt;code&gt;DEMO_KEY&lt;/code&gt;. For production use, you should get a proper API key at &lt;a href="https://fdc.nal.usda.gov/api-key-signup" rel="noopener noreferrer"&gt;https://fdc.nal.usda.gov/api-key-signup&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unit Converter
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="n"&gt;VOLUME_TO_ML&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;236.588&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tbsp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;14.787&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tsp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;4.929&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;liter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1000.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fl oz&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;29.5735&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;WEIGHT_TO_G&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;g&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1000.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;oz&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;28.3495&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;453.592&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;convert_units&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;from_unit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_unit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Convert between cooking measurement units.

    Args:
        amount: The numeric amount to convert
        from_unit: The unit to convert from
        to_unit: The unit to convert to
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;from_unit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;to_unit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Temperature
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fahrenheit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;c&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;celsius&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; F = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; C&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Volume
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;VOLUME_TO_ML&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;VOLUME_TO_ML&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;VOLUME_TO_ML&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;VOLUME_TO_ML&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;from_unit&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;to_unit&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Weight
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;WEIGHT_TO_G&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;WEIGHT_TO_G&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;WEIGHT_TO_G&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;WEIGHT_TO_G&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;from_unit&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;to_unit&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cannot convert &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;from_unit&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;to_unit&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tool is intentionally simple. It handles volume, weight, and temperature - the three conversion types that come up most often while cooking. When someone asks to convert volume to weight (e.g., "how many grams is a cup of flour?"), the tool explains that the answer depends on the ingredient's density rather than guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Magic: Interruptions
&lt;/h2&gt;

&lt;p&gt;The feature that makes bidirectional streaming feel genuinely different from a traditional voice assistant is interruption handling. You can talk over the agent mid-sentence, and it will stop, process your new input, and respond to that instead.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;You&lt;/strong&gt;: "Set a timer for 10 minutes"&lt;br&gt;
&lt;strong&gt;Agent&lt;/strong&gt;: "Timer set for 10 min-"&lt;br&gt;
&lt;strong&gt;You&lt;/strong&gt;: "Actually, make that 15"&lt;br&gt;
&lt;strong&gt;Agent&lt;/strong&gt;: "Got it, I have updated the timer to 15 minutes."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This works because the agent is always listening, even while generating output audio. When it detects that the user has started speaking, it cancels the current output and processes the interruption as new input. Strands handles the mechanics - clearing the output buffer, signaling the model to stop generating, and routing the new audio through the same pipeline.&lt;/p&gt;

&lt;p&gt;From a developer perspective, you do not need to write any interruption logic. It just works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure
&lt;/h2&gt;

&lt;p&gt;For local development, you only need IAM permissions for Bedrock access. The Terraform configuration creates an IAM role with least-privilege permissions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"bedrock_access"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"bedrock-access"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;voice_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Sid&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"NovaSonicAccess"&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="s2"&gt;"bedrock:InvokeModel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s2"&gt;"bedrock:InvokeModelWithResponseStream"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s2"&gt;"bedrock:InvokeModelWithBidirectionalStream"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="s2"&gt;"arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-2-sonic-v1*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Sid&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"KnowledgeBaseAccess"&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"bedrock:Retrieve"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="s2"&gt;"arn:aws:bedrock:us-east-1:ACCOUNT:knowledge-base/YOUR_KB_ID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important: Model ID mismatch.&lt;/strong&gt; The Bedrock foundation model ARN for Nova Sonic v2 is &lt;code&gt;amazon.nova-2-sonic-v1:0&lt;/code&gt; - not &lt;code&gt;amazon.nova-sonic-v2&lt;/code&gt; as you might guess from the marketing name. If your IAM policy uses the wrong model ID pattern, the BidiAgent will establish a connection to Nova Sonic and appear to be working, but the model silently fails to process audio. No error is returned - the agent simply produces no output. This cost me several hours of debugging. Check your IAM policy resource ARNs if the agent connects but never responds.&lt;/p&gt;

&lt;p&gt;If you are just experimenting, your default AWS CLI credentials with Bedrock access will work without the Terraform setup. The IAM role becomes useful when you want to follow least-privilege principles or deploy to AgentCore.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs and Gotchas
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The 8-Minute Session Limit
&lt;/h3&gt;

&lt;p&gt;Nova Sonic v2 sessions time out after 8 minutes. For a kitchen assistant, this is a real constraint - you might be cooking for an hour and want to check on timers, look up steps, or ask follow-up questions throughout.&lt;/p&gt;

&lt;p&gt;The workaround is session rotation: detect when a session is about to expire, save context, and start a new session. Strands does not handle this automatically yet. For the demo, 8 minutes is enough to show the concept, but a production kitchen assistant would need this solved.&lt;/p&gt;

&lt;p&gt;For comparison, OpenAI Realtime supports 60-minute sessions, though at higher cost and without the tight AWS integration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Echo Cancellation
&lt;/h3&gt;

&lt;p&gt;Nova Sonic does not handle echo cancellation server-side - it expects clean audio at the input. If the agent's output audio gets picked up by the microphone, you get a feedback loop where the agent starts responding to itself.&lt;/p&gt;

&lt;p&gt;The browser handles this transparently. The Web Audio API provides built-in acoustic echo cancellation via &lt;code&gt;getUserMedia({ echoCancellation: true })&lt;/code&gt; - the same mechanism that makes WebRTC video calls work without headsets. The browser subtracts the known output signal from the microphone input, so you can use laptop speakers and mic directly. No headset required.&lt;/p&gt;

&lt;p&gt;If you port to other platforms, the same principle applies: iOS (&lt;code&gt;AVAudioSession&lt;/code&gt;) and Android audio frameworks include system-level AEC, and smart speakers use hardware beamforming. The key point is that echo cancellation is an I/O concern, not a model concern - solve it at the client layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability Gap: No Invocation Logging
&lt;/h3&gt;

&lt;p&gt;Bedrock invocation logging does not cover the &lt;code&gt;InvokeModelWithBidirectionalStream&lt;/code&gt; API that Nova Sonic uses. If you have invocation logging enabled and expect to see Nova Sonic calls in CloudWatch, you will not find them. The API calls appear in CloudTrail as events, but the actual conversation content (audio, transcripts, tool calls) is not captured.&lt;/p&gt;

&lt;p&gt;For cost tracking, Nova Sonic is priced by audio duration (input and output seconds) rather than text tokens. AWS Cost Explorer filtered by Bedrock service is the most reliable way to monitor spend.&lt;/p&gt;

&lt;h3&gt;
  
  
  Timer Notifications Outlive Sessions
&lt;/h3&gt;

&lt;p&gt;Because of the 8-minute session limit, a timer set for 20 minutes will expire after the Nova Sonic session has ended. The timer still fires (it is a local asyncio task), but it can only print to the console - it cannot speak through the agent. A production version would need a separate notification channel (push notification, separate audio alert, etc.).&lt;/p&gt;

&lt;h2&gt;
  
  
  Sample Conversation
&lt;/h2&gt;

&lt;p&gt;Here is a typical session showing the back-and-forth between the user and the assistant, including tool calls. The transcripts below are captured from the browser UI - Nova Sonic generates both the audio and the text transcript simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;You: "What pasta recipes do you have?"&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;[search_recipes tool called]&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Assistant: "I found two pasta recipes in the collection. The first is a classic spaghetti carbonara - it uses eggs, pecorino romano, guanciale, and black pepper. No cream, as the traditional version gets its&lt;br&gt;
richness from the egg and cheese sauce. The second is a baked ziti with Italian sausage that serves about eight people. Would you like the full details on either one?"&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;You: "How many calories in a cup of pasta?"&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;[nutrition_lookup tool called]&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Assistant: "One cup of cooked spaghetti has about 220 calories, with 43 grams of carbohydrates and 8 grams of protein. That is for plain pasta without sauce."&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;You: "OK let's make the carbonara. Set a timer for 12 minutes for the pasta."&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;[set_timer tool called]&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Assistant: "Timer set for 12 minutes for the pasta. While the pasta cooks, let me walk you through the sauce. You will want to whisk together three egg yolks and one whole egg with about a cup of grated&lt;br&gt;
pecorino-"&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;You: "Wait, how much pecorino was that?"&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Assistant: "One cup of grated pecorino romano. You can mix in some parmesan too - about half and half works well."&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;You: "What is that in grams?"&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;[convert_units tool called]&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Assistant: "One cup of grated pecorino is roughly 100 grams, and one cup of parmesan is about 80 grams. So for a half-and-half mix, that would be 50 grams pecorino and 40 grams parmesan."&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Notice how the assistant handles interruptions naturally (cutting off mid-sentence when asked about pecorino), chains tool calls in context (nutrition lookup followed by unit conversion), and maintains conversational flow throughout.&lt;/p&gt;
&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The complete code is on GitHub: &lt;a href="https://github.com/RDarrylR/serverless-family-recipes-bidirectional-nova-sonic" rel="noopener noreferrer"&gt;github.com/RDarrylR/serverless-family-recipes-bidirectional-nova-sonic&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/RDarrylR/serverless-family-recipes-bidirectional-nova-sonic.git
&lt;span class="nb"&gt;cd &lt;/span&gt;serverless-family-recipes-bidirectional-nova-sonic
uv &lt;span class="nb"&gt;sync
&lt;/span&gt;make install-frontend

&lt;span class="c"&gt;# Terminal 1: WebSocket server&lt;/span&gt;
make serve

&lt;span class="c"&gt;# Terminal 2: Vite dev server&lt;/span&gt;
make serve-frontend
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;a href="http://localhost:5173" rel="noopener noreferrer"&gt;http://localhost:5173&lt;/a&gt;, click the microphone, and start talking. No headset needed - the browser handles echo cancellation.&lt;/p&gt;

&lt;p&gt;You will need Bedrock model access enabled for Nova Sonic v2 in your AWS account. If you are using the recipe search tool, you will also need a Bedrock Knowledge Base with your recipes indexed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Audio Pacing Problem
&lt;/h2&gt;

&lt;p&gt;This was the most surprising issue I hit. After wiring everything up, the assistant's voice worked - but after tool calls (like searching recipes or looking up nutrition), all the speech played back bunched together with no natural pauses between sentences. Short responses sounded fine. Long responses after tool calls sounded like someone hit 2x speed.&lt;/p&gt;

&lt;p&gt;The root cause: Nova Sonic generates audio faster than real-time after receiving a tool result. When the model has the full tool output text available at once, it produces all the speech audio in a burst rather than at natural speech pace. With WebSocket forwarding, there is no backpressure - the server receives hundreds of audio chunks in seconds and immediately forwards them all to the browser.&lt;/p&gt;

&lt;p&gt;The fix is a sufficiently large client-side ring buffer. The browser's AudioWorklet maintains a ring buffer sized for 60 seconds of audio at 24kHz. Audio arrives in bursts from the server, gets queued in the ring buffer, and plays back at the correct hardware rate. The AudioWorklet's &lt;code&gt;process()&lt;/code&gt; callback naturally paces playback - it pulls exactly 128 samples per callback at the AudioContext sample rate, regardless of how fast data arrives.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AudioPlayerProcessor&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;AudioWorkletProcessor&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="c1"&gt;// 60 seconds at 24kHz - handles faster-than-realtime bursts&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_bufferSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24000&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Float32Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_bufferSize&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_writePos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_readPos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;port&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onmessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;audio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;barge-in&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_readPos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_writePos&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Clear instantly&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_readPos&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_writePos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_buffer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_readPos&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_bufferSize&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_readPos&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Silence when buffer is empty&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The initial version used a 5-second buffer, which overflowed on longer responses - the write pointer wrapped around and overwrote unplayed samples, causing garbled audio. 60 seconds handles even the longest recipe descriptions comfortably.&lt;/p&gt;

&lt;p&gt;For interruptions (barge-in), the server sends a &lt;code&gt;bidi_interruption&lt;/code&gt; event and the browser sets &lt;code&gt;readPos = writePos&lt;/code&gt;, which instantly silences playback. New audio from the updated response then fills the buffer from that point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two AudioContexts
&lt;/h2&gt;

&lt;p&gt;The browser uses two separate &lt;code&gt;AudioContext&lt;/code&gt; instances: one at 16kHz for capture and one at 24kHz for playback. These cannot share a context because Web Audio requires a single sample rate per context, and Nova Sonic's input and output rates differ.&lt;/p&gt;

&lt;p&gt;Capture uses &lt;code&gt;ScriptProcessorNode&lt;/code&gt; (deprecated but universal) to grab Float32 audio frames, convert them to PCM16, base64-encode, and send over WebSocket. Playback uses an &lt;code&gt;AudioWorkletNode&lt;/code&gt; with a ring buffer for gapless streaming - the audio thread pulls samples continuously, and barge-in clears the buffer instantly when the server signals an interruption.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying to AgentCore
&lt;/h2&gt;

&lt;p&gt;For local development, this runs as two processes: a FastAPI server and a Vite dev server with a WebSocket proxy. For production, I deployed the voice agent to AWS Bedrock AgentCore Runtime as a Docker container.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc64t2a0v7z91iyzhx1t4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc64t2a0v7z91iyzhx1t4.png" alt="Deployed Architecture" width="800" height="1262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The deployed architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt; - React SPA on S3 + CloudFront, same pattern as the text-based recipe assistant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth&lt;/strong&gt; - Cognito User Pool (email/password) + Identity Pool. The browser exchanges a Cognito JWT for temporary AWS credentials, then uses those to SigV4-sign a WebSocket URL directly to AgentCore. No API Gateway or Lambda in the WebSocket path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Runtime&lt;/strong&gt; - The FastAPI server runs as an ARM64 container in AgentCore. The same &lt;code&gt;server.py&lt;/code&gt; that runs locally deploys without changes - AgentCore handles health checks (&lt;code&gt;/ping&lt;/code&gt;), scaling, and WebSocket proxying.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dual mode&lt;/strong&gt; - The frontend auto-detects the deployment mode. If &lt;code&gt;VITE_AGENT_RUNTIME_ARN&lt;/code&gt; is set, it uses SigV4-signed WebSocket to AgentCore. If not, it connects to the local Vite proxy. No code changes needed to switch between local and deployed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Dockerfile
&lt;/h3&gt;

&lt;p&gt;The container packages the same FastAPI server that runs locally. ARM64 is required by AgentCore Runtime.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; --platform=linux/arm64 public.ecr.aws/docker/library/python:3.13-slim&lt;/span&gt;

&lt;span class="c"&gt;# PyAudio is a transitive dependency of strands-agents[bidi-all]&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    libasound-dev libportaudio2 libportaudiocpp0 &lt;span class="se"&gt;\
&lt;/span&gt;    portaudio19-dev gcc python3-dev &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt-get clean &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; config.py server.py ./&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; tools/ ./tools/&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;useradd &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; 1000 appuser &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;chown&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; appuser:appuser /app
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; appuser&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; CONTAINER_ENV=true&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PYTHONUNBUFFERED=1&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8080&lt;/span&gt;

&lt;span class="k"&gt;HEALTHCHECK&lt;/span&gt;&lt;span class="s"&gt; --interval=30s --timeout=3s --start-period=5s --retries=3 \&lt;/span&gt;
    CMD ["python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/ping').read()"]

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["opentelemetry-instrument", "uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8080"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;CONTAINER_ENV&lt;/code&gt; flag tells &lt;code&gt;server.py&lt;/code&gt; to bind to &lt;code&gt;0.0.0.0&lt;/code&gt; instead of &lt;code&gt;127.0.0.1&lt;/code&gt;. The &lt;code&gt;opentelemetry-instrument&lt;/code&gt; wrapper is required for AgentCore to capture container logs in CloudWatch - without it, stdout and stderr are silently dropped. The &lt;code&gt;aws-opentelemetry-distro&lt;/code&gt; package in &lt;code&gt;requirements.txt&lt;/code&gt; provides this wrapper. The health check hits the &lt;code&gt;/ping&lt;/code&gt; endpoint that AgentCore polls every few seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraform: AgentCore IAM Role
&lt;/h3&gt;

&lt;p&gt;The container needs an IAM role that AgentCore assumes on its behalf. This is where the model ID gotcha matters most - get the ARN wrong and Nova Sonic silently ignores audio.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"agentcore"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-agentcore"&lt;/span&gt;

  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Service&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"bedrock-agentcore.amazonaws.com"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"agentcore"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"agentcore-permissions"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agentcore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Sid&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"NovaSonicAccess"&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="s2"&gt;"bedrock:InvokeModel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s2"&gt;"bedrock:InvokeModelWithResponseStream"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s2"&gt;"bedrock:InvokeModelWithBidirectionalStream"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="c1"&gt;# NOTE: The model ID is amazon.nova-2-sonic-v1, NOT amazon.nova-sonic-v2&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="s2"&gt;"arn:aws:bedrock:${var.aws_region}::foundation-model/amazon.nova-2-sonic-v1*"&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Sid&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"KnowledgeBaseAccess"&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"bedrock:Retrieve"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="s2"&gt;"arn:aws:bedrock:${var.aws_region}:${data.aws_caller_identity.current.account_id}:knowledge-base/${var.knowledge_base_id}"&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Sid&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ECRImageAccess"&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ecr:BatchGetImage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"ecr:GetDownloadUrlForLayer"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="s2"&gt;"arn:aws:ecr:${var.aws_region}:${data.aws_caller_identity.current.account_id}:repository/*"&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Sid&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ECRTokenAccess"&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ecr:GetAuthorizationToken"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"*"&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="c1"&gt;# CloudWatch, X-Ray, and WorkloadIdentity statements omitted for brevity&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;bedrock-agentcore.amazonaws.com&lt;/code&gt; service principal in the trust policy allows AgentCore to assume this role when running the container. The policy grants exactly what the agent needs: invoke Nova Sonic, retrieve from the Knowledge Base, and pull the container image from ECR.&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraform: Cognito Authentication
&lt;/h3&gt;

&lt;p&gt;The browser needs AWS credentials to sign WebSocket requests. Cognito provides a two-step flow: User Pool for authentication, Identity Pool for credential exchange.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# User Pool - email/password authentication&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_cognito_user_pool"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;                     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-users"&lt;/span&gt;
  &lt;span class="nx"&gt;username_attributes&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;auto_verified_attributes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;admin_create_user_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;allow_admin_create_user_only&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;  &lt;span class="c1"&gt;# No self-signup for the demo&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;password_policy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;minimum_length&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
    &lt;span class="nx"&gt;require_lowercase&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="nx"&gt;require_uppercase&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="nx"&gt;require_numbers&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Public client for the SPA - no client secret&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_cognito_user_pool_client"&lt;/span&gt; &lt;span class="s2"&gt;"web"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-web-client"&lt;/span&gt;
  &lt;span class="nx"&gt;user_pool_id&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_cognito_user_pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;generate_secret&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

  &lt;span class="nx"&gt;explicit_auth_flows&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s2"&gt;"ALLOW_USER_SRP_AUTH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;"ALLOW_REFRESH_TOKEN_AUTH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Identity Pool - exchange JWT for temporary AWS credentials&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_cognito_identity_pool"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;identity_pool_name&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-identity"&lt;/span&gt;
  &lt;span class="nx"&gt;allow_unauthenticated_identities&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

  &lt;span class="nx"&gt;cognito_identity_providers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;client_id&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_cognito_user_pool_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;web&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
    &lt;span class="nx"&gt;provider_name&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_cognito_user_pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;endpoint&lt;/span&gt;
    &lt;span class="nx"&gt;server_side_token_check&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# IAM role that authenticated users assume&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"cognito_authenticated"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-cognito-auth-role"&lt;/span&gt;

  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Federated&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"cognito-identity.amazonaws.com"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sts:AssumeRoleWithWebIdentity"&lt;/span&gt;
      &lt;span class="nx"&gt;Condition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;StringEquals&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="s2"&gt;"cognito-identity.amazonaws.com:aud"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_cognito_identity_pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="s2"&gt;"ForAnyValue:StringLike"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="s2"&gt;"cognito-identity.amazonaws.com:amr"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"authenticated"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Grant authenticated users permission to invoke the agent via WebSocket&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"agentcore_invoke"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent_runtime_arn&lt;/span&gt; &lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"agentcore-invoke"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cognito_authenticated&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s2"&gt;"bedrock-agentcore:InvokeAgentRuntime"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"bedrock-agentcore:InvokeAgentRuntimeWithWebSocketStream"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"bedrock-agentcore:InvokeRuntime"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"bedrock-agentcore:InvokeRuntimeWithResponseStream"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"*"&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;count = var.agent_runtime_arn != "" ? 1 : 0&lt;/code&gt; handles a chicken-and-egg problem: Terraform creates the Cognito resources first, then you create the AgentCore runtime via CLI (there is no Terraform provider for AgentCore yet), then re-run Terraform with the runtime ARN to attach the invoke policy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IAM resource scoping gotcha:&lt;/strong&gt; You might expect to scope &lt;code&gt;Resource&lt;/code&gt; to your specific runtime ARN instead of &lt;code&gt;"*"&lt;/code&gt;. I tried this and it breaks WebSocket connections with a silent failure - the browser gets a connection refused with no useful error. The &lt;code&gt;InvokeAgentRuntimeWithWebSocketStream&lt;/code&gt; action evaluates against a resource ARN that includes session and qualifier components, not just the runtime ARN. The official AWS sample uses &lt;code&gt;Resource: "*"&lt;/code&gt; for the same reason. Until AWS documents the exact resource ARN format for WebSocket invocations, &lt;code&gt;"*"&lt;/code&gt; is the only option that works. The blast radius is limited since these are Cognito-scoped temporary credentials that can only call &lt;code&gt;bedrock-agentcore:Invoke*&lt;/code&gt; actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  SigV4 WebSocket Presigning
&lt;/h3&gt;

&lt;p&gt;This is the most unusual piece of the deployment. The browser cannot use a simple WebSocket URL - it needs to SigV4-sign the request using temporary AWS credentials from Cognito. The signing process constructs the URL, signs it with the AWS Signature Version 4 algorithm, and appends the signature as query parameters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getAWSCredentials&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./aws-credentials.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Sha256&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-crypto/sha256-js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;SignatureV4&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-sdk/signature-v4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;HttpRequest&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@smithy/protocol-http&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getPresignedWebSocketUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;agentRuntimeArn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;region&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;VITE_REGION&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;us-east-1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;credentials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getAWSCredentials&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Build the AgentCore WebSocket URL&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;encodedArn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;encodeURIComponent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;agentRuntimeArn&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`https://bedrock-agentcore.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.amazonaws.com/runtimes/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;encodedArn&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/ws`&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;searchParams&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;qualifier&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;DEFAULT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;searchParams&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;X-Amzn-Bedrock-AgentCore-Runtime-Session-Id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Sign with SigV4&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;signer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SignatureV4&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bedrock-agentcore&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;accessKeyId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;accessKeyId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;secretAccessKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;secretAccessKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;sessionToken&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sessionToken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Sha256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HttpRequest&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromEntries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;searchParams&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hostname&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;signed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;signer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;presign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;expiresIn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Convert to wss:// URL&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;queryString&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;signed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(([&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;encodeURIComponent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;encodeURIComponent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;))}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;amp;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`wss://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;signed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hostname&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;signed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;?&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;queryString&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The credential exchange happens one step earlier. The browser takes the Cognito ID token and exchanges it for temporary AWS credentials via the Identity Pool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CognitoIdentityClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;GetIdCommand&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="nx"&gt;GetCredentialsForIdentityCommand&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-sdk/client-cognito-identity&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getAWSCredentials&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;idToken&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getIdToken&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CognitoIdentityClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;region&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;providerName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`cognito-idp.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.amazonaws.com/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userPoolId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 1: Get identity ID from the token&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;IdentityId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;GetIdCommand&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;IdentityPoolId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;identityPoolId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Logins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;providerName&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="nx"&gt;idToken&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 2: Exchange for temporary AWS credentials&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Credentials&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;GetCredentialsForIdentityCommand&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nx"&gt;IdentityId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Logins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;providerName&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="nx"&gt;idToken&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;accessKeyId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Credentials&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AccessKeyId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;secretAccessKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Credentials&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SecretKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;sessionToken&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Credentials&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SessionToken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These temporary credentials are scoped by the Cognito authenticated role - they can only call &lt;code&gt;bedrock-agentcore:Invoke*&lt;/code&gt; actions, not other AWS services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment Commands
&lt;/h3&gt;

&lt;p&gt;The full deployment workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Provision infrastructure (S3, CloudFront, Cognito, ECR)&lt;/span&gt;
make apply

&lt;span class="c"&gt;# 2. Build and push the ARM64 container to ECR&lt;/span&gt;
make docker-build
make docker-push

&lt;span class="c"&gt;# 3. Create the AgentCore runtime (first time only)&lt;/span&gt;
make create-agent

&lt;span class="c"&gt;# 4. Add the runtime ARN to terraform.tfvars, re-apply for Cognito IAM policy&lt;/span&gt;
make apply

&lt;span class="c"&gt;# 5. Generate frontend .env from Terraform outputs&lt;/span&gt;
make setup-env

&lt;span class="c"&gt;# 6. Build and deploy frontend to S3 + invalidate CloudFront cache&lt;/span&gt;
make deploy-frontend
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For subsequent code changes, &lt;code&gt;make deploy-agent&lt;/code&gt; rebuilds the container and updates the runtime in one step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key deployment gotcha:&lt;/strong&gt; AgentCore container deployments require unique ECR image tags to force replacement. Pushing a new image with the &lt;code&gt;latest&lt;/code&gt; tag does not trigger a container update - you must use a unique tag (I use &lt;code&gt;v$(date +%Y%m%d%H%M%S)&lt;/code&gt;) and pass it to &lt;code&gt;update-agent-runtime&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Does It Cost?
&lt;/h2&gt;

&lt;p&gt;Nova Sonic pricing is per-token, not per-minute. Audio converts at roughly 25 tokens per second.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Token type&lt;/th&gt;
&lt;th&gt;Price per 1,000 tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Speech input&lt;/td&gt;
&lt;td&gt;$0.0034&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speech output&lt;/td&gt;
&lt;td&gt;$0.0136&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text input&lt;/td&gt;
&lt;td&gt;$0.00006&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text output&lt;/td&gt;
&lt;td&gt;$0.00024&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Speech tokens dominate the bill. Text tokens only apply to tool call inputs/outputs and any conversation history injected as text - these are negligible for a voice-only session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typical session costs:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A 5-minute cooking session where the user asks 4-5 questions and the assistant responds with recipe steps and tool results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Speech input (~2 min of user speaking): 2 x 60 x 25 = 3,000 tokens = $0.010&lt;/li&gt;
&lt;li&gt;Speech output (~3 min of assistant speaking): 3 x 60 x 25 = 4,500 tokens = $0.061&lt;/li&gt;
&lt;li&gt;Text tokens (tool calls/results): ~2,000 tokens = $0.0005&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~$0.07 per session&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For comparison, an 8-minute session (the Nova Sonic maximum) with continuous conversation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Speech input (~3.5 min): 5,250 tokens = $0.018&lt;/li&gt;
&lt;li&gt;Speech output (~4.5 min): 6,750 tokens = $0.092&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~$0.11 per session&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is roughly 80% cheaper than OpenAI's GPT-4o Realtime API for equivalent conversation length.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tracking costs in code:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AWS does not currently publish CloudWatch metrics for &lt;code&gt;InvokeModelWithBidirectionalStream&lt;/code&gt; - the standard Bedrock metrics only cover &lt;code&gt;InvokeModel&lt;/code&gt;, &lt;code&gt;Converse&lt;/code&gt;, and their streaming variants. Model invocation logging and Application Inference Profiles (for cost tagging) also do not support bidirectional streaming yet.&lt;/p&gt;

&lt;p&gt;The practical workaround is application-level cost estimation. The server logs session duration, audio chunk counts, and an estimated cost for each session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Nova Sonic pricing (per 1,000 tokens)
&lt;/span&gt;&lt;span class="n"&gt;SPEECH_INPUT_PRICE_PER_1K&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0034&lt;/span&gt;
&lt;span class="n"&gt;SPEECH_OUTPUT_PRICE_PER_1K&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0136&lt;/span&gt;
&lt;span class="n"&gt;TOKENS_PER_SECOND_AUDIO&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;

&lt;span class="c1"&gt;# In the WebSocket handler:
&lt;/span&gt;&lt;span class="n"&gt;session_start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;monotonic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;input_audio_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;output_audio_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="c1"&gt;# ... wrap receive/send to count audio chunks ...
&lt;/span&gt;
&lt;span class="c1"&gt;# In the finally block:
&lt;/span&gt;&lt;span class="n"&gt;session_duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;monotonic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;session_start&lt;/span&gt;
&lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_duration&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;TOKENS_PER_SECOND_AUDIO&lt;/span&gt;
&lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_duration&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;TOKENS_PER_SECOND_AUDIO&lt;/span&gt;
&lt;span class="n"&gt;estimated_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;SPEECH_INPUT_PRICE_PER_1K&lt;/span&gt; \
               &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;SPEECH_OUTPUT_PRICE_PER_1K&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Session ended: duration=%.1fs, est_cost=$%.4f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_duration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;estimated_cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This uses wall-clock time as an upper bound (both speakers are not active simultaneously for the full duration). For aggregate cost monitoring, AWS Cost Explorer does show Bedrock charges broken down by model, so you can track monthly Nova Sonic spend at the account level - you just cannot get per-invocation breakdowns without application-level logging.&lt;/p&gt;

&lt;h2&gt;
  
  
  Converging Voice and Text: Future Integration
&lt;/h2&gt;

&lt;p&gt;I now have two separate recipe assistants - a &lt;a href="https://darryl-ruggles.cloud/serverless-recipe-assistant-with-agentcore-and-strands/" rel="noopener noreferrer"&gt;text-based version&lt;/a&gt; that uses CloudFront + Lambda + AgentCore for HTTP/SSE streaming, and this voice version that uses CloudFront + Cognito + AgentCore for WebSocket streaming. They share the same Bedrock Knowledge Base and similar tool implementations, but have separate infrastructure and deployment pipelines.&lt;/p&gt;

&lt;p&gt;The natural next step is to combine them into a single application with both input modes. The architecture would look like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unified frontend&lt;/strong&gt; - A single React SPA that offers both a chat interface and a voice interface. The text chat already uses SSE streaming through Lambda; the voice mode connects directly to AgentCore via WebSocket. Both modes share the same auth (Cognito) and the same CloudFront distribution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared agent&lt;/strong&gt; - Both modes could use the same AgentCore runtime with the same tools. The text path would go through Lambda (for SSE streaming), while the voice path would connect directly via WebSocket. The agent code is already nearly identical - same tools, same Knowledge Base, same system prompt. The main difference is the I/O layer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared infrastructure&lt;/strong&gt; - The Terraform modules for auth, CDN, and storage are similar between the two projects. Combining them eliminates duplication and simplifies deployment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mode switching&lt;/strong&gt; - In the kitchen, you might start with text ("what should I make for dinner?"), switch to voice once you start cooking ("what is the next step?"), and go back to text when things get noisy. A unified app would make this seamless.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The biggest challenge is the streaming protocol difference. The text assistant uses HTTP POST + SSE (request-response), while the voice assistant uses a persistent bidirectional WebSocket. Lambda handles the text path well but maybe isn't the best approach for the WebSocket connections needed here. AgentCore handles Websockets for this case well. The solution may be to have both paths - Lambda for text, AgentCore container for voice - behind the same CloudFront distribution.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Session rotation&lt;/strong&gt; - Automatically reconnect when the 8-minute Nova Sonic limit is reached, preserving conversation context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified app&lt;/strong&gt; - Merge the text and voice assistants into a single application with both input modes, shared tools, and shared infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alexa integration&lt;/strong&gt; - Strands BidiAgent with a custom I/O adapter could bridge to the Alexa Skills Kit for hands-free kitchen use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-modal input&lt;/strong&gt; - Nova Sonic v2 supports text input alongside audio. Adding a text fallback for when voice is not practical (quiet environments, accessibility needs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The gap between a text-based agent and a voice-based agent is smaller than I expected. Strands abstracts the hard parts - WebSocket management, audio encoding, interruption handling, concurrent tool execution - and lets you focus on the tools and the system prompt. The same &lt;code&gt;@tool&lt;/code&gt; decorator, the same docstring-based tool selection, the same Bedrock integration. Just with a microphone instead of a text box.&lt;/p&gt;

&lt;p&gt;If you have an existing Strands agent, adding voice is closer to a weekend project than a rewrite.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Connect with me on&lt;/em&gt; &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://bsky.app/profile/darryl-ruggles.cloud" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://github.com/RDarrylR/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;em&gt;, or the&lt;/em&gt; &lt;a href="https://builder.aws.com/community/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;&lt;em&gt;. Check out more of my projects at&lt;/em&gt; &lt;a href="https://darryl-ruggles.cloud" rel="noopener noreferrer"&gt;darryl-ruggles.cloud&lt;/a&gt; &lt;em&gt;and join the&lt;/em&gt; &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless&lt;/a&gt; &lt;em&gt;community.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agentcore</category>
      <category>aws</category>
      <category>fullstack</category>
      <category>fastapi</category>
    </item>
    <item>
      <title>A Serverless Recipe Assistant with Bedrock AgentCore, Knowledge Bases, and S3 Vectors</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Thu, 05 Mar 2026 16:01:24 +0000</pubDate>
      <link>https://forem.com/aws-builders/a-serverless-recipe-assistant-with-bedrock-agentcore-knowledge-bases-and-s3-vectors-4lkp</link>
      <guid>https://forem.com/aws-builders/a-serverless-recipe-assistant-with-bedrock-agentcore-knowledge-bases-and-s3-vectors-4lkp</guid>
      <description>&lt;p&gt;During the last couple of years I have changed my life by adopting a low-carb/keto diet and lots of regular cardio and weight training. As part of this I have accumulated many recipes using alternative ingredients like nut flours, sugar substitutes, and more. Also our family has accumulated more than 600 recipes over the years - everything from bread experiments I've been perfecting, to my grandmother's handwritten brown bread recipe on a stained index card, to my wife's collection of holiday baking favourites bookmarked from blogs that no longer exist. They were scattered across PDFs, photos of handwritten cards, screenshots, and dead URLs. Finding anything was painful - I'd scroll through folders, skim titles, and still end up Googling a recipe I already had saved somewhere.&lt;/p&gt;

&lt;p&gt;I wanted a way to just &lt;em&gt;ask&lt;/em&gt; for what I needed: "what keto bread recipes do we have?", "what's in grandma's homemade brown bread?", or "what are the macros in that almond flour waffle recipe?" - and get answers from our own collection, with the option to look up accurate nutrition from the USDA database for any recipe or ingredient list.&lt;/p&gt;

&lt;p&gt;This project started as a way to learn &lt;a href="https://aws.amazon.com/bedrock/knowledge-bases/" rel="noopener noreferrer"&gt;&lt;strong&gt;Amazon Bedrock Knowledge Bases&lt;/strong&gt;&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/" rel="noopener noreferrer"&gt;&lt;strong&gt;Bedrock AgentCore&lt;/strong&gt;&lt;/a&gt; by building something I'd actually use every day. Along the way it turned into a full-stack serverless application with recipe search, real-time USDA nutrition lookup, text-to-speech cooking mode, and streaming chat - all for about &lt;strong&gt;$0.08 per half month&lt;/strong&gt; in AWS costs.&lt;/p&gt;

&lt;p&gt;You can check out the complete source code in my GitHub repo here → &lt;a href="https://github.com/RDarrylR/serverless-family-recipe-assistant" rel="noopener noreferrer"&gt;Serverless Family Recipe Assistant Repo&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Problem&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Our family recipe collection is a mix of everything - food blog PDFs, Instagram screenshots, handwritten cards from my grandmother and mother-in-law, scribbled notes on the backs of envelopes, and bookmarked URLs that had since gone dead. Some of the oldest recipes were on stained, faded cards where the handwriting was barely legible. Others were photos taken at awkward angles with half the ingredients cut off.&lt;/p&gt;

&lt;p&gt;Beyond just finding recipes, I wanted to track nutrition. I've been following a keto diet for a few years, and knowing the macros - especially net carbs - matters. But most family recipes don't come with a nutrition label. I wanted to be able to ask "what are the macros in grandma's homemade brown bread?" and get a real answer calculated from USDA data, not a guess.&lt;/p&gt;

&lt;p&gt;Traditional approaches (a database with manual data entry, or simple full-text search) wouldn't cut it. Manually transcribing 600+ recipes - many handwritten - would take forever. And I didn't want to fill out search forms. I wanted to type "show me chicken recipes under 10g net carbs" and get actual answers from our own collection.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Solution - Architecture Overview&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The system uses a layered serverless architecture: &lt;strong&gt;CloudFront + Lambda Function URL + WAF&lt;/strong&gt; as a secure proxy layer in front of &lt;strong&gt;Bedrock AgentCore&lt;/strong&gt;. CloudFront serves both the React frontend (from S3) and the API (&lt;code&gt;/chat&lt;/code&gt; path) from a single distribution, giving you CDN caching, HTTPS, and WAF integration with one resource. WAF provides rate limiting (100 requests per 5 minutes per IP) and bad input protection. The Lambda function, configured with &lt;code&gt;RESPONSE_STREAM&lt;/code&gt; mode, enables true SSE streaming with no timeout constraint, bridges JWT authentication to IAM, and signs requests to AgentCore with SigV4.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjt7ehezp4vnrgj35m9of.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjt7ehezp4vnrgj35m9of.png" alt="Current system architecture" width="800" height="808"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why This Layered Architecture?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Each layer in the stack serves a specific purpose:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CloudFront&lt;/strong&gt; - single HTTPS domain for both frontend and API, CDN edge caching for static assets, WAF integration for rate limiting and input filtering, and origin access control (OAC) for secure S3 and Lambda origins&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lambda Function URL&lt;/strong&gt; - auth bridging (JWT to IAM), SigV4 request signing to AgentCore, &lt;code&gt;RESPONSE_STREAM&lt;/code&gt; invoke mode for true SSE and request/response transformation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WAF&lt;/strong&gt; - rate limiting per IP to prevent abuse, and managed rule groups for bad input protection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AgentCore&lt;/strong&gt; - managed agent runtime with auto-scaling to zero, built-in telemetry, container management, and memory&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why CloudFront + Lambda Function URL (Not API Gateway or ALB)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The browser needs to talk to AgentCore, but AgentCore uses IAM auth and the browser has JWT tokens. Something has to sit in the middle to bridge that gap, provide HTTPS, and add WAF protection. Three AWS options can do this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API Gateway REST API (v1)&lt;/strong&gt; is the traditional choice. It has a built-in Cognito authorizer, request validation, usage plans, and API keys. But it buffers responses with a 29-second timeout (I know you can open a support ticket and get this increased but i didn't want to go down that path) - there's no streaming support. An agent response that involves a KB search, a USDA API call, and a multi-paragraph answer easily takes 10-30 seconds, and complex queries can run longer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Application Load Balancer&lt;/strong&gt; supports WebSockets and long connections with no timeout ceiling, and gives you weighted target groups for canary deployments. But ALBs have a fixed hourly cost (~$16/month minimum) regardless of traffic. For a personal project handling a few requests per day, that's more than the rest of the infrastructure combined.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CloudFront + Lambda Function URL&lt;/strong&gt; is what we went with. Lambda Function URL with &lt;code&gt;RESPONSE_STREAM&lt;/code&gt; invoke mode supports true SSE with no timeout constraint -- the Lambda streams chunks as they arrive from AgentCore, and CloudFront forwards them to the browser. CloudFront adds CDN edge caching for the frontend, HTTPS termination, and WAF integration, all from a single distribution. The Lambda handles JWT-to-IAM auth bridging and SigV4 request signing. At low traffic, the entire layer costs $0 (CloudFront and Lambda both have generous free tiers).&lt;/p&gt;

&lt;p&gt;The tradeoff: you lose API Gateway's built-in JWT authorizer and request validation, so you handle those in Lambda code. You also lose ALB's weighted routing for canary deployments. For a personal/family app, those aren't needed. For a multi-tenant SaaS product, you'd likely want API Gateway's usage plans and API keys, and accept the streaming limitation (or use WebSockets instead of SSE).&lt;/p&gt;

&lt;p&gt;One subtlety worth noting: CloudFront's origin read timeout defaults to 60 seconds. Once streaming starts, each SSE chunk resets that clock, so long responses work fine as long as data keeps flowing. But if the agent goes quiet during a long tool execution (say, a slow external API call), CloudFront could drop the connection. The Lambda sends SSE keepalive comments (&lt;code&gt;: keepalive&lt;/code&gt;) every 30 seconds to prevent this.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why AgentCore?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Deploying an AI agent to production involves a lot more than writing the agent logic. You need an HTTP endpoint that can handle long-running streaming connections, authentication middleware, auto-scaling that can handle bursty traffic, container orchestration, health checks, and observability. In a traditional setup, you'd wire together API Gateway (or an ALB), ECS or EKS, ECR for container images, CloudWatch for logs, and X-Ray for tracing. That's a lot of infrastructure to build and maintain for what is, at its core, a Python function that takes a prompt and yields text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bedrock AgentCore&lt;/strong&gt; collapses all of that into a managed runtime. You write a Python function decorated with &lt;code&gt;@app.entrypoint&lt;/code&gt;, define your auth and memory config in a YAML file, and deploy with one CLI command. AgentCore handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTTP ingress with SSE support&lt;/strong&gt; - long-lived streaming connections work out of the box, no ALB timeout tuning or WebSocket configuration needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IAM authentication&lt;/strong&gt; - AgentCore uses IAM auth, and the Lambda proxy signs each request with SigV4 using its execution role credentials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-scaling to zero&lt;/strong&gt; - consumption-based pricing means you pay only for vCPU-seconds and memory-seconds while handling requests. No minimum instance count, no idle compute. My half-month bill for AgentCore was $0.06&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container management&lt;/strong&gt; - AgentCore builds and runs your container. You don't write Dockerfiles, manage ECS task definitions, or configure load balancers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in telemetry&lt;/strong&gt; - OTLP-compatible tracing and metrics with no setup. AgentCore provides the telemetry endpoint automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The practical result: the entire agent deployment is a &lt;code&gt;make deploy-agent&lt;/code&gt; command that takes about 2 minutes. Compare that to the hours you'd spend configuring ECS + ALB + API Gateway + CloudWatch + X-Ray for the same functionality.&lt;/p&gt;

&lt;p&gt;The Lambda function in front of AgentCore is intentionally thin - it handles auth bridging (decoding the JWT and signing the request with SigV4) and streams the response back through CloudFront. All the heavy lifting - LLM inference, tool execution, memory management, and observability - stays in AgentCore where it belongs.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Services Used&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CloudFront&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CDN, HTTPS, WAF integration, single distribution for frontend + API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lambda&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Streaming proxy - bridges JWT auth to IAM, SigV4 signing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WAF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rate limiting (100 req/5min per IP), bad input protection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bedrock AgentCore&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed runtime for the Strands SDK agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bedrock Knowledge Base&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Vector search over recipe collection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;S3 Vectors&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Zero-idle-cost vector storage for embeddings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon S3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Recipe markdown storage + frontend static hosting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon Cognito&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;User authentication (User Pool + Identity Pool)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon Polly&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Text-to-speech for cooking mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bedrock LLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Nova Pro (default) or Claude Sonnet 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Titan Embed V2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Embedding model for Knowledge Base&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Terraform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Infrastructure as code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Recipe Ingestion Pipeline&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before the agent can search recipes, they need to be converted from their original formats into something a Knowledge Base can index. The ingestion pipeline handles this in three stages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxqieq5d8283pk2re723c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxqieq5d8283pk2re723c.png" alt="Recipe ingestion pipeline" width="715" height="3301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Stage 1: Preprocessing - Where AI Really Shines&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;preprocess_recipes.py&lt;/code&gt; script takes raw source files - PDFs, images, DOCX files - and converts them into structured markdown using &lt;strong&gt;Amazon Nova Pro&lt;/strong&gt; as a multimodal LLM.&lt;/p&gt;

&lt;p&gt;This is where AI made the biggest practical difference. Many of our family recipes were photos of handwritten cards - some decades old, stained with cooking splatters, faded ink, and the kind of cursive that only the person who wrote it could normally read. Nova Pro's multimodal capabilities handled these remarkably well. It could read through coffee stains, interpret faded handwriting, and even make reasonable guesses at abbreviated ingredient names ("chx" → "chicken", "tbls" → "tablespoon"). A few needed manual corrections, but the vast majority were parsed correctly on the first pass - work that would have taken weeks to do by hand.&lt;/p&gt;

&lt;p&gt;Each recipe gets a consistent format:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Title, description, source URL&lt;/li&gt;
&lt;li&gt;Ingredients with quantities&lt;/li&gt;
&lt;li&gt;Numbered instructions&lt;/li&gt;
&lt;li&gt;Nutrition per serving (from the source, or LLM-estimated and marked &lt;code&gt;[estimated]&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Tags for dietary categories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmh1pubwunjj6vdviofo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmh1pubwunjj6vdviofo.png" alt="Parsing to MD from image file" width="800" height="222"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Stage 2: USDA Nutrition Enrichment&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;About 57% of recipes had LLM-estimated nutrition that wasn't very accurate. The &lt;code&gt;enrich_nutrition.py&lt;/code&gt; script re-calculates nutrition for each recipe using the &lt;strong&gt;USDA FoodData Central API&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parse each ingredient line (quantity, unit, name) using regex&lt;/li&gt;
&lt;li&gt;Look up each ingredient in the USDA database (Foundation + SR Legacy data types)&lt;/li&gt;
&lt;li&gt;Convert from per-100g USDA data to actual recipe quantities using a unit conversion table&lt;/li&gt;
&lt;li&gt;Replace &lt;code&gt;[estimated]&lt;/code&gt; markers with &lt;code&gt;[USDA]&lt;/code&gt; markers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This runs as a batch process locally and caches API responses to avoid redundant lookups.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Stage 3: Upload &amp;amp; Index&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;sync_recipes.py&lt;/code&gt; script uploads the processed markdown files to S3 and triggers a Knowledge Base ingestion job. The KB uses &lt;strong&gt;Titan Embed V2&lt;/strong&gt; to generate embeddings and stores them in &lt;strong&gt;S3 Vectors&lt;/strong&gt; with semantic chunking (300 max tokens, 95% breakpoint threshold).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make preprocess    &lt;span class="c"&gt;# PDFs/images → structured markdown via Nova Pro&lt;/span&gt;
make &lt;span class="nb"&gt;sync&lt;/span&gt;          &lt;span class="c"&gt;# Upload to S3 + trigger KB ingestion&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;The Beauty of Plain Markdown&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Real-world data ingestion is messy. No matter how good the AI is, you end up with typos, misread quantities, garbled ingredient names, and the occasional recipe that just won't parse at all. One of my grandmother's recipes came through with "1/2 cup fleur" instead of "flour". Another had the instructions jumbled because the photo was taken at an angle. A few recipes from scanned PDFs had entire sections missing where the scan quality dropped off.&lt;/p&gt;

&lt;p&gt;This is where the decision to use plain markdown pays off. The processed recipes are just &lt;code&gt;.md&lt;/code&gt; files sitting in &lt;code&gt;data/processed/&lt;/code&gt;. If the AI misread an ingredient, got a quantity wrong, or produced something unparseable, you open the file in any text editor, fix it, and run &lt;code&gt;make sync&lt;/code&gt;. The Knowledge Base re-ingests in a couple of minutes and the agent immediately returns the corrected version. No database migrations, no redeployment, no rebuild. It's the simplest possible feedback loop: edit a text file, push, done.&lt;/p&gt;

&lt;p&gt;I ended up doing a manual pass through about 10% of the recipes after the initial AI processing. That's still massively better than transcribing all 600+ by hand, and the corrections took minutes each rather than the 5-10 minutes it would take to type out a full recipe from scratch.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Adding New Recipes&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The collection keeps growing. When I find a new recipe I want to keep, the process is simple: save the source file (PDF, screenshot, photo, or even just copy-paste the text into a &lt;code&gt;.md&lt;/code&gt; file) into your recipe source directory (configured via &lt;code&gt;RECIPE_SOURCE_DIR&lt;/code&gt; in &lt;code&gt;.env&lt;/code&gt;), run &lt;code&gt;make preprocess&lt;/code&gt; to let Nova Pro convert it to structured markdown, review and fix any issues in the output, then &lt;code&gt;make sync&lt;/code&gt; to push it to S3 and re-index the Knowledge Base. The new recipe is searchable within a couple of minutes. For recipes that are already in a clean text format, I sometimes skip the AI preprocessing entirely and just write the markdown file directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Agent - Strands SDK on AgentCore&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The agent is built with the &lt;strong&gt;Strands Agents SDK&lt;/strong&gt; and deployed to &lt;strong&gt;Bedrock AgentCore&lt;/strong&gt;. It's a surprisingly small amount of code - the core entrypoint is about 40 lines of Python.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why Strands SDK&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;There are several Python frameworks for building LLM-powered agents - LangChain, LlamaIndex, AutoGen, CrewAI, and others. I chose Strands for a few reasons.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Minimal abstraction.&lt;/strong&gt; Strands doesn't try to wrap everything in its own object model. Tools are plain Python functions decorated with &lt;code&gt;@tool&lt;/code&gt;. The agent is constructed with a model, a system prompt, and a list of tools. There's no chain-of-thought pipeline to configure, no "memory module" to plug in, no retriever-adapter pattern. You write Python, not framework DSL. When I needed to debug why the agent was calling &lt;code&gt;search_recipes&lt;/code&gt; twice for the same query, I could read through the Strands source in an afternoon - it's a thin layer over the Bedrock Converse API, not a 50-module abstraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Native streaming.&lt;/strong&gt; The agent exposes an async generator (&lt;code&gt;agent.stream_async()&lt;/code&gt;) that yields events as they happen - text deltas, tool invocations, and metadata. This maps directly to the SSE pattern the frontend needs. There's no callback system to wire up and no post-processing step to convert the response into a streamable format. Each &lt;code&gt;yield&lt;/code&gt; from the agent becomes an SSE &lt;code&gt;data:&lt;/code&gt; line.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First-class AgentCore integration.&lt;/strong&gt; Strands was built alongside AgentCore, so the deployment model is native. The &lt;code&gt;@app.entrypoint&lt;/code&gt; decorator, the &lt;code&gt;BedrockAgentCoreApp&lt;/code&gt; class, and the &lt;code&gt;AgentCoreMemorySessionManager&lt;/code&gt; all come from the SDK. There's no glue code needed to bridge the framework to the runtime. This also means the SDK's streaming protocol matches what AgentCore expects - you yield dicts and they become SSE events.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Predictable behavior.&lt;/strong&gt; The agent loop is straightforward: send prompt to LLM, if the LLM requests a tool call then execute it and feed the result back, repeat until the LLM produces a final response. The &lt;code&gt;max_iterations&lt;/code&gt; parameter caps cycles to prevent runaway loops. There's no autonomous planning step, no chain selection logic, and no implicit retries that change behavior in surprising ways. For a recipe assistant with two tools, this simplicity is exactly right.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Open source.&lt;/strong&gt; Strands is Apache 2.0 licensed, so there's no vendor lock-in concern beyond the Bedrock model provider integration (and even that is pluggable - Strands supports other providers).&lt;/p&gt;

&lt;p&gt;The trade-off is that Strands is newer and less battle-tested than tools like LangChain. The documentation is still catching up, and some features (like long-term memory and multi-agent coordination) are less mature. For a focused single-agent application like this, those gaps didn't matter. For a complex multi-agent orchestration system, you might want to evaluate more carefully.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Entrypoint&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Frontend sends {cognito_sub}_{conversation_uuid} as session_id.
&lt;/span&gt;    &lt;span class="c1"&gt;# AgentCore validates the JWT but strips the Authorization header,
&lt;/span&gt;    &lt;span class="c1"&gt;# so we derive user_id from the session_id prefix.
&lt;/span&gt;    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anonymous&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rsplit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;

    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentCoreMemoryConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MEMORY_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MAX_RETRIES&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;AgentCoreMemorySessionManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;REGION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session_manager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_recipes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calculate_nutrition&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="n"&gt;session_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;last_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
                &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;stream_event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream_event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;stream_event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
                    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;stream_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current_tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                        &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stream_event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current_tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;last_tool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                            &lt;span class="n"&gt;last_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;
                            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;MAX_RETRIES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each &lt;code&gt;yield&lt;/code&gt; becomes an SSE event that the frontend consumes in real time. The agent forwards both text chunks and tool-use notifications so the UI can show what's happening.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;LLM Configuration&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The agent supports multiple models, selected via the &lt;code&gt;ACTIVE_LLM&lt;/code&gt; environment variable at deploy time:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;ID&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;nova&lt;/code&gt; (default)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;us.amazon.nova-pro-v1:0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;General use, free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;nova-lite&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;us.amazon.nova-lite-v1:0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Faster, cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;us.anthropic.claude-sonnet-4-20250514-v1:0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Higher quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-haiku&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;us.anthropic.claude-3-5-haiku-20241022-v1:0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fast and cheap&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Memory - Multi-Turn Conversations&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;One of the things that makes this feel like a real assistant rather than a search box is that the agent remembers the conversation. Without memory, every prompt is independent - the agent has no idea what you asked 10 seconds ago. You'd have to repeat context every time: "In the almond flour waffle recipe you just showed me, what are the macros?" becomes meaningless if the agent doesn't know which recipe it just showed you.&lt;/p&gt;

&lt;p&gt;Building this yourself means managing a conversation store (typically DynamoDB or Redis), writing serialization logic to pack and unpack conversation turns, handling TTLs and cleanup, scoping sessions by user and conversation, and injecting the right context window into each LLM call. It's not difficult code, but it's infrastructure and logic that has nothing to do with your agent's actual purpose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AgentCore Memory&lt;/strong&gt; handles all of this as a managed service. You configure a memory ID and session scope, wrap your request in an &lt;code&gt;AgentCoreMemorySessionManager&lt;/code&gt; context manager, and the SDK takes care of the rest - loading prior turns on entry, saving the new turn on exit. The result is natural multi-turn conversations:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;You:&lt;/strong&gt; "What keto bread recipes do we have?" &lt;strong&gt;Agent:&lt;/strong&gt; &lt;em&gt;(searches KB, returns 3 recipes)&lt;/em&gt; &lt;strong&gt;You:&lt;/strong&gt; "What are the macros in the second one?" &lt;strong&gt;Agent:&lt;/strong&gt; &lt;em&gt;(knows which recipe you mean, calculates nutrition)&lt;/em&gt; &lt;strong&gt;You:&lt;/strong&gt; "How does that compare to the first one?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each conversation gets a unique session ID (&lt;code&gt;{cognito_sub}_{conversation_uuid}&lt;/code&gt;) scoped by user. Memory events expire after 30 days. Clicking "New Chat" generates a fresh UUID, giving the agent a clean slate with no prior context. The total code to integrate memory is about 6 lines - the config object and the &lt;code&gt;with&lt;/code&gt; block.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Custom Tools - Recipe Search &amp;amp; Nutrition&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The agent has two tools it can call during a conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;search_recipes&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Queries the Bedrock Knowledge Base using vector similarity search. Key implementation details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Top 3 results&lt;/strong&gt; to avoid overwhelming the LLM context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score threshold of 0.3&lt;/strong&gt; to discard irrelevant matches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk merging&lt;/strong&gt; - recipes split across multiple embedding chunks get reassembled into a single result. Without this, you'd get fragments of the same recipe appearing as separate results.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_recipes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the recipe knowledge base. Always use this tool first
    when users ask about recipes, ingredients, or cooking.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;calculate_nutrition - Real-Time USDA Lookup&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This is one of the most useful features for anyone tracking macros. You can ask the agent to calculate nutrition for &lt;em&gt;anything&lt;/em&gt; - a recipe from the collection, a custom ingredient list, or even "what are the macros in 2 cups of almond flour and 3 eggs?" The agent calls the &lt;strong&gt;USDA FoodData Central API&lt;/strong&gt; in real time and returns a detailed breakdown.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_nutrition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ingredients&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Calculate nutrition for a list of ingredients using USDA data.
    Input: newline-separated ingredients with quantities.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tool parses quantities and units, looks up each ingredient in the USDA database (Foundation + SR Legacy data types), converts from per-100g USDA values to actual recipe quantities, and returns a formatted markdown table with per-ingredient breakdowns and totals - calories, net carbs, total carbs, fat, protein, and fiber. The UI shows "Calculating nutrition..." while this runs, and the results come back with a &lt;code&gt;[USDA]&lt;/code&gt; marker so you know the data source.&lt;/p&gt;

&lt;p&gt;This was especially valuable for our older family recipes that never had nutrition info. I can now ask "what are the macros in grandma's homemade brown bread?" and get real USDA-backed numbers instead of guesses.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;The Frontend - React 19 with SSE Streaming&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4e1df2wqeuc2l85zp3f1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4e1df2wqeuc2l85zp3f1.png" alt="Recipe lookup" width="800" height="823"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5xvp1wnl0bpw7ucnkok4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5xvp1wnl0bpw7ucnkok4.png" alt="USDA Lookup" width="800" height="494"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The frontend is a React 19 single-page application built with Vite. No TypeScript - plain JSX throughout. It POSTs to the CloudFront &lt;code&gt;/chat&lt;/code&gt; endpoint, which proxies through the Lambda Function URL to AgentCore. The request includes the JWT in the &lt;code&gt;Authorization&lt;/code&gt; header and an &lt;code&gt;x-amz-content-sha256&lt;/code&gt; header (SHA-256 hash of the request body, required by CloudFront OAC for POST requests). CloudFront proxies to the Lambda Function URL, which streams SSE back through CloudFront. The frontend uses &lt;code&gt;fetch&lt;/code&gt; with &lt;code&gt;ReadableStream&lt;/code&gt; to consume these SSE events, which gives more control than the &lt;code&gt;EventSource&lt;/code&gt; API (custom headers for auth, POST method, abort support). The &lt;code&gt;VITE_API_URL&lt;/code&gt; environment variable points to the CloudFront &lt;code&gt;/chat&lt;/code&gt; path.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;SSE Streaming&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The chat hook processes two types of SSE payloads:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tool_use&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Show "Searching recipes..." or "Calculating nutrition..."&lt;/span&gt;
    &lt;span class="nf"&gt;onMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tool_use&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tool_use&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Append text to the current message, render markdown&lt;/span&gt;
    &lt;span class="nf"&gt;onMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chunk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The frontend strips &lt;code&gt;\&lt;/code&gt; blocks from Claude's extended thinking output before displaying text to the user.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Abort Handling - Harder Than It Sounds&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Streaming responses from an LLM can take a while, especially when the agent is searching the KB and then generating a long recipe with full instructions and nutrition. Users need to be able to cancel mid-stream. The send button swaps to a red stop button during streaming, and clicking it triggers &lt;code&gt;AbortController.abort()&lt;/code&gt; to close the HTTP connection.&lt;/p&gt;

&lt;p&gt;The tricky part is what happens to memory. When you abort, AgentCore may have already saved a partial conversation turn - half a recipe, a dangling tool call, or garbled markdown. If the next request loads that corrupted context, the agent gets confused and produces garbage.&lt;/p&gt;

&lt;p&gt;The fix: on abort, the client generates a new conversation UUID and stores it in &lt;code&gt;sessionStorage&lt;/code&gt;. The next request uses this fresh session ID, so AgentCore Memory loads a clean slate instead of the corrupted partial turn. The trade-off is that you lose conversation history from before the abort, but that's better than garbled responses.&lt;/p&gt;

&lt;p&gt;This same session-reset mechanism powers the "New Chat" button - it's the same operation, just intentional rather than error recovery.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;S3 Vectors - The $0.00 Vector Store&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This was the most surprising part of the project. Traditional RAG setups use &lt;strong&gt;OpenSearch Serverless&lt;/strong&gt; as the vector store, which costs a minimum of &lt;strong&gt;~$350/month&lt;/strong&gt; for 2 OCUs (OpenSearch Compute Units) - even with zero traffic. For a personal recipe project, that's a non-starter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;S3 Vectors&lt;/strong&gt; provides the same semantic search capability with truly pay-per-query pricing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Vector Store&lt;/th&gt;
&lt;th&gt;Monthly Cost (dev)&lt;/th&gt;
&lt;th&gt;Monthly Cost (prod, ~10K queries)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenSearch Serverless&lt;/td&gt;
&lt;td&gt;~$350&lt;/td&gt;
&lt;td&gt;~$350+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 Vectors&lt;/td&gt;
&lt;td&gt;~$0.00&lt;/td&gt;
&lt;td&gt;~$0.40&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;S3 Vectors pricing: &lt;strong&gt;$0.00004/query&lt;/strong&gt; + $0.0006/GB-month storage. At my development usage of ~200 KB queries per half month, the cost rounds to zero.&lt;/p&gt;

&lt;p&gt;The vector index uses 1024 dimensions (matching Titan Embed V2), cosine distance similarity, and float32 precision. It's configured via Terraform:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3vectors_index"&lt;/span&gt; &lt;span class="s2"&gt;"recipes"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vector_bucket_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3vectors_vector_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;recipes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;index_name&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-recipes"&lt;/span&gt;

  &lt;span class="nx"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;dimension&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;
    &lt;span class="nx"&gt;distance_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"cosine"&lt;/span&gt;
    &lt;span class="nx"&gt;data_type&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"float32"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;Authentication - Cognito JWT&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Authentication uses &lt;strong&gt;Amazon Cognito&lt;/strong&gt; with a User Pool for signup/signin and an Identity Pool for temporary AWS credentials (used by the browser for Polly TTS). The auth flow bridges JWT-based frontend authentication to IAM-based AgentCore authentication through the Lambda proxy layer.&lt;/p&gt;

&lt;p&gt;The flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User signs in via the React auth screen (email + password, SRP auth)&lt;/li&gt;
&lt;li&gt;Cognito returns access + ID + refresh tokens&lt;/li&gt;
&lt;li&gt;Frontend sends POST to CloudFront &lt;code&gt;/chat&lt;/code&gt; with &lt;code&gt;Bearer&lt;/code&gt; JWT in the &lt;code&gt;Authorization&lt;/code&gt; header and an &lt;code&gt;x-amz-content-sha256&lt;/code&gt; header (SHA-256 hash of the request body)&lt;/li&gt;
&lt;li&gt;A CloudFront Function copies &lt;code&gt;Authorization&lt;/code&gt; to &lt;code&gt;X-Forwarded-Authorization&lt;/code&gt; before OAC replaces the &lt;code&gt;Authorization&lt;/code&gt; header with SigV4 credentials for the Lambda Function URL origin&lt;/li&gt;
&lt;li&gt;Lambda decodes the JWT from &lt;code&gt;X-Forwarded-Authorization&lt;/code&gt; to extract the user identity (&lt;code&gt;sub&lt;/code&gt; claim) for session scoping&lt;/li&gt;
&lt;li&gt;Lambda signs the request to AgentCore with SigV4 (IAM auth) using its execution role credentials&lt;/li&gt;
&lt;li&gt;For text-to-speech, the ID token is exchanged via the Identity Pool for temporary IAM credentials scoped to &lt;code&gt;polly:SynthesizeSpeech&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No Amplify - just &lt;code&gt;amazon-cognito-identity-js&lt;/code&gt; for the Cognito SDK, keeping the bundle small.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Real-Time Tool Visibility&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When the agent calls a tool, there's a noticeable delay while the Knowledge Base search or USDA API call completes. Instead of showing generic typing dots, the UI shows what the agent is actually doing.&lt;/p&gt;

&lt;p&gt;The Strands SDK emits &lt;code&gt;current_tool_use&lt;/code&gt; events during streaming. The agent forwards these as &lt;code&gt;{"tool_use": "search_recipes"}&lt;/code&gt; SSE payloads. The frontend maps tool names to human-readable labels:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool Name&lt;/th&gt;
&lt;th&gt;UI Label&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search_recipes&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;"Searching recipes..."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;calculate_nutrition&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;"Calculating nutrition..."&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The label appears next to the animated typing dots and disappears as soon as text starts streaming. It's a small touch but makes a real difference in perceived responsiveness.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Text-to-Speech Cooking Mode&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When you're actually cooking, you don't want to keep looking at your phone with flour-covered hands. &lt;strong&gt;Cooking mode&lt;/strong&gt; uses Amazon Polly to automatically read each bot response aloud as it completes.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Toggle on/off with a speaker button in the header&lt;/li&gt;
&lt;li&gt;Per-message play/pause/restart controls on every bot response&lt;/li&gt;
&lt;li&gt;Uses Polly's generative voice via temporary Cognito Identity Pool credentials&lt;/li&gt;
&lt;li&gt;Streams audio directly from the browser - no Lambda or backend involvement&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Observability - Traces, Metrics, and Logs&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7ixxhcct3snzqjpo6qx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7ixxhcct3snzqjpo6qx.png" alt="Observability trace flow" width="800" height="1185"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Setting &lt;code&gt;observability.enabled: true&lt;/code&gt; in the AgentCore YAML config gives you structured logs and service-level metrics in CloudWatch without any code changes. But the real payoff comes from distributed tracing - seeing exactly what happens inside the agent during every request.&lt;/p&gt;

&lt;p&gt;Beyond AgentCore, the CloudFront and Lambda layers also contribute to the observability story. CloudFront access logs capture every request to both the frontend and the &lt;code&gt;/chat&lt;/code&gt; API, including client IP, response time, and cache status. Lambda CloudWatch logs provide request-level tracing through the proxy layer, including JWT validation results, AgentCore response status, and streaming duration. Together these give you end-to-end visibility from the browser through CloudFront, Lambda, and into AgentCore.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What AgentCore Provides by Default&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With observability enabled, AgentCore automatically emits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Service-level metrics&lt;/strong&gt; - invocation count, error rate, latency (p50/p95/p99), token usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured JSON logs&lt;/strong&gt; - every log statement includes &lt;code&gt;traceId&lt;/code&gt;, &lt;code&gt;spanId&lt;/code&gt;, &lt;code&gt;requestId&lt;/code&gt;, and &lt;code&gt;sessionId&lt;/code&gt; for correlation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory spans&lt;/strong&gt; - trace segments for AgentCore Memory read/write operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These show up in the CloudWatch GenAI Observability dashboard and X-Ray without any code changes. But there's a catch - by default, these are the &lt;em&gt;only&lt;/em&gt; spans you get. The Bedrock LLM invocations, Knowledge Base queries, tool executions, and external HTTP calls are invisible in the trace.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Unlocking Full Distributed Tracing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Strands Agents SDK has comprehensive built-in OpenTelemetry instrumentation that creates a rich hierarchy of trace spans:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Span Type&lt;/th&gt;
&lt;th&gt;What It Captures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Top-level span for the entire invocation, aggregate token usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cycle&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Each event loop iteration (reasoning cycle)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Individual Bedrock model invocations with prompts, completions, and token counts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tool execution with name, inputs, outputs, and timing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;But these spans only get exported if a tracer provider is configured. The required piece is the &lt;strong&gt;AWS Distro for OpenTelemetry&lt;/strong&gt; (&lt;code&gt;aws-opentelemetry-distro&lt;/code&gt;). Adding this single package to &lt;code&gt;requirements.txt&lt;/code&gt; does three things at once:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Configures a global tracer provider&lt;/strong&gt; so the Strands SDK actually exports its spans&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-instruments botocore/boto3&lt;/strong&gt; capturing Bedrock &lt;code&gt;InvokeModel&lt;/code&gt; and KB &lt;code&gt;Retrieve&lt;/code&gt; calls as child spans&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-instruments the&lt;/strong&gt; &lt;code&gt;requests&lt;/code&gt; &lt;strong&gt;library&lt;/strong&gt; capturing external HTTP calls (like the USDA API) as child spans&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;AgentCore Runtime automatically runs the &lt;code&gt;opentelemetry-instrument&lt;/code&gt; wrapper when this package is present - no code changes needed beyond adding the dependency.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# agent/requirements.txt
strands-agents[otel]&amp;gt;=0.1.0
aws-opentelemetry-distro&amp;gt;=0.10.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;What the Traces Look Like&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;After adding the ADOT distro, a single recipe search request produces a trace like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AgentCore Runtime Request
  └── Agent Span (Strands)
      ├── Cycle 1
      │   ├── LLM Invoke (bedrock converse)
      │   ├── Tool: search_recipes
      │   │   └── bedrock-agent-runtime.Retrieve
      │   └── LLM Invoke (bedrock converse)
      └── Memory Write (AgentCore)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A more complex request involving nutrition calculation shows the external API call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AgentCore Runtime Request
  └── Agent Span (Strands)
      ├── Cycle 1
      │   ├── LLM Invoke (bedrock converse)
      │   ├── Tool: search_recipes
      │   │   └── bedrock-agent-runtime.Retrieve
      │   └── LLM Invoke (bedrock converse)
      ├── Cycle 2
      │   ├── LLM Invoke (bedrock converse)
      │   ├── Tool: calculate_nutrition
      │   │   └── HTTP GET api.nal.usda.gov
      │   └── LLM Invoke (bedrock converse)
      └── Memory Write (AgentCore)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each span includes timing data, so you can immediately see where latency lives. In practice, the KB retrieval and LLM inference dominate - the USDA API calls are fast by comparison.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Custom Trace Attributes&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Strands SDK supports custom trace attributes on the Agent constructor, which get attached to every span in the request. This is useful for filtering traces by user or session in CloudWatch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_recipes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calculate_nutrition&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;trace_attributes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session.id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user.id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;strong&gt;Cost Breakdown&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This is the part that surprised me the most. Here are the actual AWS costs for the first half of February 2026:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bedrock (LLM)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;td&gt;Free tier - Nova Pro, Titan Embed V2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bedrock AgentCore&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.06&lt;/td&gt;
&lt;td&gt;~6 vCPU-hours, consumption-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;S3 (storage + vectors)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.02&lt;/td&gt;
&lt;td&gt;Recipes, frontend, embeddings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CloudFront&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;td&gt;Free tier (1 TB transfer/month)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lambda&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;td&gt;Free tier (1M requests/month)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WAF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;td&gt;Free tier covers dev usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cognito&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;td&gt;Free tier (&amp;lt; 50K MAU)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CloudWatch&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.00&lt;/td&gt;
&lt;td&gt;Free tier covers logs + metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Project total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$0.08&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Half month&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The entire project costs &lt;strong&gt;$0.08 for a half month&lt;/strong&gt; of development usage. The Terraform provisions zero compute resources - no EC2 instances, no ECS tasks, no ECR repositories. Every service is either covered by free tier or has pay-per-use pricing with no idle cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why So Cheap?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Four architectural decisions keep costs near zero:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;S3 Vectors over OpenSearch Serverless&lt;/strong&gt; - saves ~$350/month at any scale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AgentCore over self-managed compute&lt;/strong&gt; - no idle costs, no infrastructure to manage. An equivalent ECS Fargate setup would cost $30-50/month minimum even with zero traffic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudFront + Lambda + WAF&lt;/strong&gt; - all three are covered by free tier at development usage levels. CloudFront gives you CDN, HTTPS, and WAF integration without adding meaningful cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bedrock free tier&lt;/strong&gt; - Nova Pro's free tier (50K input token requests, 300K output tokens/month) covers development usage comfortably&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Projected Costs at Scale&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Monthly Est.&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Light (1 user, ~50 queries/day, Nova Pro)&lt;/td&gt;
&lt;td&gt;~$6.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium (10 users, ~500 queries/day, Nova Pro)&lt;/td&gt;
&lt;td&gt;~$44&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Heavy (100 users, ~5K queries/day, Claude Sonnet 4)&lt;/td&gt;
&lt;td&gt;~$995&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Model choice is the dominant cost factor. Switching from Claude Sonnet 4 ($3.00/M input tokens) to Nova Pro ($0.80/M) at the same volume drops the LLM cost dramatically. The CloudFront + WAF layer adds roughly $5-10/month at medium to heavy scale (WAF web ACL base fee + per-request charges), which is negligible compared to LLM costs.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Key Learnings&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;S3 Vectors changes the RAG cost equation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before S3 Vectors, the minimum cost of entry for a RAG application was ~$350/month for OpenSearch Serverless. That made it impractical for personal projects, prototypes, or low-traffic production workloads. S3 Vectors eliminates the idle cost entirely. This is a big deal for anyone building RAG applications that don't need millisecond latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;AgentCore is the simplest way to deploy an agent&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;No Dockerfiles to maintain, no ECS task definitions, no load balancers. You write a Python function decorated with &lt;code&gt;@app.entrypoint&lt;/code&gt;, run &lt;code&gt;make deploy-agent&lt;/code&gt;, and you have a managed HTTP endpoint with JWT auth, auto-scaling, and telemetry. The trade-off is less control over the runtime environment, but for most agent workloads that's a fine trade.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Chunk merging is essential for Knowledge Base search&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Bedrock Knowledge Base splits documents into chunks for embedding. When a recipe spans multiple chunks, the default behavior returns each chunk as a separate search result - so you might get the ingredients from one chunk and the instructions from another, but never the complete recipe. Grouping chunks by source document and merging them back together was critical for usable results.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Stream everything&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The difference between waiting 8 seconds for a complete response and seeing the first words appear after a few seconds is enormous for user experience. The CloudFront to Lambda to AgentCore streaming chain delivers text chunks to the browser as they're generated, with incremental markdown rendering that makes the agent feel responsive even when it's doing multi-step tool calls behind the scenes. Adding tool-use indicators ("Searching recipes...") takes this further - the user knows the agent is working, not stalled.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Multimodal AI is a game-changer for digitizing family recipes&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I was genuinely surprised at how well Nova Pro handled our oldest, most damaged recipes. Handwritten cards with coffee stains, faded ink, and cramped cursive that I could barely read myself - the model parsed them into structured ingredients and instructions with maybe a 90% accuracy rate. The remaining 10% needed minor corrections (a misread quantity here, a garbled ingredient name there), but the alternative was typing 600+ recipes by hand. This alone saved weeks of work and made the entire project feasible.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Understanding Strands SDK event loop and guardrails&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Strands Agents SDK event loop runs until the model emits &lt;code&gt;end_turn&lt;/code&gt; (done generating) or &lt;code&gt;max_tokens&lt;/code&gt; (response truncated). There's no built-in iteration cap - the model decides when to stop. For a recipe assistant with two tools this works fine, but for safety I added a &lt;code&gt;BeforeToolCallEvent&lt;/code&gt; hook that cancels tool execution after 10 calls per request, preventing runaway loops:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MaxToolCallsHook&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_calls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_calls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_calls&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register_hooks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BeforeToolCallEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_on_before_tool_call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_on_before_tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cancel_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Maximum tool calls reached.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The hooks system is one of the cleanest parts of the Strands SDK - you implement &lt;code&gt;HookProvider&lt;/code&gt; with a &lt;code&gt;register_hooks&lt;/code&gt; method, and the registry gives you typed events for every stage of the agent lifecycle: &lt;code&gt;BeforeModelCallEvent&lt;/code&gt;, &lt;code&gt;AfterModelCallEvent&lt;/code&gt;, &lt;code&gt;BeforeToolCallEvent&lt;/code&gt;, &lt;code&gt;AfterToolCallEvent&lt;/code&gt;, and more. The &lt;code&gt;cancel_tool&lt;/code&gt; field on &lt;code&gt;BeforeToolCallEvent&lt;/code&gt; is particularly useful - it short-circuits the tool call and feeds the cancel message back to the model as a tool result, prompting it to wrap up with what it has.&lt;/p&gt;

&lt;p&gt;On the token side, &lt;code&gt;max_tokens=8192&lt;/code&gt; on the &lt;code&gt;BedrockModel&lt;/code&gt; prevents runaway generation. If the model hits this limit, the SDK raises a &lt;code&gt;MaxTokensReachedException&lt;/code&gt; rather than silently truncating. Understanding these SDK internals - hooks, token limits, and stop reasons - is important for building agents that behave predictably.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What's Next&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The long-term vision for this project includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Meal planning&lt;/strong&gt; - generate weekly meal plans from the recipe collection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shopping lists&lt;/strong&gt; - aggregate ingredients, normalize quantities, deduplicate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Store availability&lt;/strong&gt; - check prices at Loblaws, Walmart, Farm Boy, Metro, Amazon.ca, Costco.ca&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom domain with ACM certificate&lt;/strong&gt; - use a branded domain name with an AWS Certificate Manager TLS certificate instead of the default CloudFront distribution domain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice input&lt;/strong&gt; - Web Speech API for hands-free queries while cooking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-platform&lt;/strong&gt; - PWA improvements for iPad/Android, and eventually an Alexa skill&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD&lt;/strong&gt; - GitHub Actions for automated testing and deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feaic6sqt1ml7x17l48ca.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feaic6sqt1ml7x17l48ca.png" alt="Future Plans" width="800" height="1589"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Try It Yourself&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You can clone the GitHub repo and deploy this in your own AWS account. The infrastructure is fully defined in Terraform.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Prerequisites&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;AWS account&lt;/li&gt;
&lt;li&gt;AWS CLI, AgentCore CLI&lt;/li&gt;
&lt;li&gt;Terraform &amp;gt;= 1.14&lt;/li&gt;
&lt;li&gt;Python 3.13+ with &lt;code&gt;uv&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Node.js 20+ with npm&lt;/li&gt;
&lt;li&gt;A USDA FoodData Central API key (free -&lt;a href="https://fdc.nal.usda.gov/api-key-signup" rel="noopener noreferrer"&gt;sign up here&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Quick Start&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Bootstrap requires two passes because Terraform needs the AgentCore runtime ARN, but the agent can't be deployed until the infrastructure (KB, Cognito, S3) exists. See the &lt;a href="https://github.com/RDarrylR/serverless-family-recipe-assistant/blob/main/README.md" rel="noopener noreferrer"&gt;README&lt;/a&gt; for full configuration details.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/RDarrylR/serverless-family-recipe-assistant.git
&lt;span class="nb"&gt;cd &lt;/span&gt;serverless-family-recipe-assistant

&lt;span class="c"&gt;# 1. Install dependencies&lt;/span&gt;
make init               &lt;span class="c"&gt;# Install Python deps&lt;/span&gt;
make install-frontend   &lt;span class="c"&gt;# Install frontend deps&lt;/span&gt;

&lt;span class="c"&gt;# 2. First Terraform pass (infrastructure without agent)&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;infrastructure
&lt;span class="nb"&gt;cp &lt;/span&gt;terraform.tfvars.example terraform.tfvars
&lt;span class="c"&gt;# Edit terraform.tfvars: set alert_email and aws_profile&lt;/span&gt;
terraform init &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; terraform apply
&lt;span class="nb"&gt;cd&lt;/span&gt; ..

&lt;span class="c"&gt;# 3. Configure and deploy agent&lt;/span&gt;
aws xray update-trace-segment-destination &lt;span class="nt"&gt;--destination&lt;/span&gt; CloudWatchLogs  &lt;span class="c"&gt;# one-time&lt;/span&gt;
make agent-config       &lt;span class="c"&gt;# Generate AgentCore YAML from Terraform outputs&lt;/span&gt;
make setup-env          &lt;span class="c"&gt;# Generate .env and frontend/.env from Terraform outputs&lt;/span&gt;
&lt;span class="c"&gt;# Edit .env: set USDA_API_KEY, RECIPE_SOURCE_DIR&lt;/span&gt;
make deploy-agent       &lt;span class="c"&gt;# Deploy agent to AgentCore&lt;/span&gt;
&lt;span class="c"&gt;# Edit .env: set AGENT_RUNTIME_ID (from "make agent-status") and&lt;/span&gt;
&lt;span class="c"&gt;#   MEMORY_ID (from "grep memory_id agent/.bedrock_agentcore.yaml")&lt;/span&gt;

&lt;span class="c"&gt;# 4. Second Terraform pass (wire up Lambda to AgentCore)&lt;/span&gt;
&lt;span class="c"&gt;# Edit terraform.tfvars: set agent_runtime_arn, add CloudFront URL to cognito_callback_urls&lt;/span&gt;
make apply              &lt;span class="c"&gt;# Re-apply Terraform (wires Lambda to AgentCore)&lt;/span&gt;
make setup-env          &lt;span class="c"&gt;# Regenerate .env files (preserves manual values)&lt;/span&gt;

&lt;span class="c"&gt;# 5. Process recipes and deploy&lt;/span&gt;
make preprocess         &lt;span class="c"&gt;# Convert raw recipes to markdown&lt;/span&gt;
make &lt;span class="nb"&gt;sync&lt;/span&gt;               &lt;span class="c"&gt;# Upload to S3 + index in Knowledge Base&lt;/span&gt;
make deploy-frontend    &lt;span class="c"&gt;# Build React app + deploy to S3 + invalidate CloudFront&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CLEANUP (IMPORTANT!!)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you deploy this yourself please understand some of the included resources will cost you a small amount of real money. Please don't forget about it.&lt;/p&gt;

&lt;p&gt;Please &lt;strong&gt;MAKE SURE TO DELETE&lt;/strong&gt; the infrastructure if you are no longer using it. Running &lt;code&gt;terraform destroy&lt;/code&gt; from the &lt;code&gt;infrastructure/&lt;/code&gt; directory will handle this, or you can delete resources manually in the AWS console.&lt;/p&gt;




&lt;p&gt;For more articles from me please visit my blog at &lt;a href="https://darryl-ruggles.cloud/" rel="noopener noreferrer"&gt;&lt;strong&gt;Darryl's World of Cloud&lt;/strong&gt;&lt;/a&gt; or find me on &lt;a href="https://bsky.app/profile/darryl-ruggles.cloud" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;, &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;&lt;strong&gt;X&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;&lt;strong&gt;LinkedIn&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;&lt;strong&gt;Medium&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://dev.to/rdarrylr"&gt;&lt;strong&gt;Dev.to&lt;/strong&gt;&lt;/a&gt;, or the &lt;a href="https://builder.aws.com/community/@darrylr" rel="noopener noreferrer"&gt;&lt;strong&gt;AWS Community&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For tons of great serverless content and discussions please join the Believe In Serverless community we have put together at this link: &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Believe In Serverless Community&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>agentcore</category>
      <category>strands</category>
      <category>agents</category>
    </item>
    <item>
      <title>AWS Lambda Durable Functions - Build a Loan Approval Workflow with Checkpoints, Callbacks, and a React Frontend</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Sun, 01 Mar 2026 16:10:53 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-lambda-durable-functions-build-a-loan-approval-workflow-with-checkpoints-callbacks-and-a-5b5n</link>
      <guid>https://forem.com/aws-builders/aws-lambda-durable-functions-build-a-loan-approval-workflow-with-checkpoints-callbacks-and-a-5b5n</guid>
      <description>&lt;p&gt;I have been using AWS Step Functions for workflow orchestration for a long time now. I used it in my &lt;a href="https://darryl-ruggles.cloud/serverless-data-processor-using-aws-lambda-step-functions-and-fargate-on-ecs-with-rust" rel="noopener noreferrer"&gt;Serverless Data Processor&lt;/a&gt; project to coordinate batch data processing and in the &lt;a href="https://darryl-ruggles.cloud/serverless-pizza-ordering" rel="noopener noreferrer"&gt;Serverless Cloud Pizzeria&lt;/a&gt; to manage pizza order progress with task tokens. Step Functions is a solid service that does a lot of things well, especially when you want visual observability and native AWS service integrations.&lt;/p&gt;

&lt;p&gt;When AWS announced Lambda Durable Functions at re:Invent 2025, I was immediately curious about how it compared and whether it could handle the kind of workflows I typically build. After spending some time reading the docs, community articles, and building a demo project, I wanted to share what I learned and walk through a working example.&lt;/p&gt;

&lt;p&gt;The full source for this project can be found here on GitHub: &lt;a href="https://github.com/RDarrylR/lambda-durable-demo" rel="noopener noreferrer"&gt;https://github.com/RDarrylR/lambda-durable-demo&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Long-Running Lambda Workflows
&lt;/h2&gt;

&lt;p&gt;If you have spent any time building serverless applications on AWS you have likely run into the Lambda 15-minute timeout limit. For many use cases this is fine. But as soon as you need to do things like wait for a human to approve something, call an external service that takes a while, or string together a pipeline with multiple steps that could take varying amounts of time you start running into challenges.&lt;/p&gt;

&lt;p&gt;Before durable functions the typical options were:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build your own state machine&lt;/strong&gt; using DynamoDB and SQS to track progress and resume work across multiple Lambda invocations. This works but involves a lot of glue code that has nothing to do with your actual business logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Step Functions&lt;/strong&gt; to define your workflow as a state machine in Amazon States Language (ASL). Step Functions handles the orchestration, retries, and state management for you. This is a great option and one I have used many times. The tradeoff is you are writing Amazon States Language (ASL) state machine definitions rather than code, and local testing can be more involved. I really do not like fighting with ASL.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Lambda Durable Functions give you a third option, keep writing normal Python or Node.js and let Lambda handle the checkpointing and replay mechanics for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Lambda Durable Functions?
&lt;/h2&gt;

&lt;p&gt;The core idea behind durable functions is based on four concepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Checkpoint&lt;/strong&gt;: Every time you complete a step the result is persisted automatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replay&lt;/strong&gt;: If the function gets interrupted it re-runs your handler from the top but skips any step that already has a cached result&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Suspend&lt;/strong&gt;: Wait operations pause the execution entirely, with zero compute charges while the clock ticks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resume&lt;/strong&gt;: A new Lambda invocation picks up where the last one left off with all context restored&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mental model is this: &lt;strong&gt;your code runs many times, but each step only executes once&lt;/strong&gt;. Once you internalize that, everything else clicks.&lt;/p&gt;

&lt;p&gt;The service launched in December 2025 with support for Python 3.13 and 3.14, and Node.js 22 and 24. The SDKs are open source on GitHub. Total execution time can be up to one year spread across many invocations even though each individual invocation is still subject to the standard 15-minute Lambda timeout.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Demo Project - Loan Approval Workflow
&lt;/h2&gt;

&lt;p&gt;To really understand how this works I built a full-stack loan approval workflow with a React frontend. This demonstrates checkpoints, parallel execution, two different callback patterns, DynamoDB progress tracking, and real-time frontend updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn2m4vsxmtgh5ja62fa8b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn2m4vsxmtgh5ja62fa8b.png" alt="Architecture" width="800" height="399"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The project has four AWS resources defined in a SAM template:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LoanProgressTable&lt;/strong&gt; - DynamoDB table for tracking workflow progress that the frontend polls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LoanWorkflowFunction&lt;/strong&gt; - The durable function that runs the entire loan approval workflow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FraudCheckFunction&lt;/strong&gt; - A separate Lambda that simulates an external fraud check service and demonstrates the callback pattern&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LoanApi + LoanApiFunction&lt;/strong&gt; - API Gateway HttpApi with a Lambda handler that exposes &lt;code&gt;POST /apply&lt;/code&gt;, &lt;code&gt;GET /status/{id}&lt;/code&gt;, and &lt;code&gt;POST /approve/{id}&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Workflow
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6x741jmjb43yzktbhzpp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6x741jmjb43yzktbhzpp.png" alt="Workflow" width="800" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are three hardcoded demo profiles that produce predictable outcomes so you can demonstrate all the paths:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Profile&lt;/th&gt;
&lt;th&gt;SIN (last 4)&lt;/th&gt;
&lt;th&gt;Default Amount&lt;/th&gt;
&lt;th&gt;Outcome&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Alice Johnson&lt;/td&gt;
&lt;td&gt;1111&lt;/td&gt;
&lt;td&gt;$150,000&lt;/td&gt;
&lt;td&gt;Always approved (triggers manager approval for amounts &amp;gt;= $100K)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bob Martinez&lt;/td&gt;
&lt;td&gt;2222&lt;/td&gt;
&lt;td&gt;$50,000&lt;/td&gt;
&lt;td&gt;Always denied (credit score too low)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Charlie Wilson&lt;/td&gt;
&lt;td&gt;3333&lt;/td&gt;
&lt;td&gt;$25,000&lt;/td&gt;
&lt;td&gt;Approved if &amp;lt;= $25,000, denied if above&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Code - Walking Through the Durable Workflow
&lt;/h2&gt;

&lt;p&gt;Let's look at the key parts of the code. I will focus on the durable function itself since that is where all the interesting stuff happens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Making a Function Durable
&lt;/h3&gt;

&lt;p&gt;The first thing you need to know is that making a Lambda function durable requires two things. In your code you use the &lt;code&gt;@durable_execution&lt;/code&gt; decorator on the handler and the &lt;code&gt;@durable_step&lt;/code&gt; decorator on each step function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_durable_execution_sdk_python&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;DurableContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;StepContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;durable_execution&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;durable_step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@durable_step&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_application&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;StepContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;application&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Validate the loan application fields.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;required&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;applicant_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ssn_last4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;annual_income&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;loan_amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;loan_purpose&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;missing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;required&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;application&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing required fields: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ... validation logic ...
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;application&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;estimated_dti&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dti_estimate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="c1"&gt;# ... more fields ...
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is just a normal Python function with a decorator. It receives a &lt;code&gt;StepContext&lt;/code&gt; as the first argument and returns a result. That result gets checkpointed automatically. If the Lambda replays later this step will return the cached result instead of running the validation logic again.&lt;/p&gt;

&lt;p&gt;In the SAM template you add &lt;code&gt;DurableConfig&lt;/code&gt; to the function resource:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;LoanWorkflowFunction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless::Function&lt;/span&gt;
  &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;FunctionName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LoanWorkflowFunction&lt;/span&gt;
    &lt;span class="na"&gt;CodeUri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;src/&lt;/span&gt;
    &lt;span class="na"&gt;Handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;loan_demo.lambda_handler&lt;/span&gt;
    &lt;span class="na"&gt;Timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;
    &lt;span class="na"&gt;AutoPublishAlias&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;live&lt;/span&gt;
    &lt;span class="na"&gt;DurableConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;ExecutionTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3600&lt;/span&gt;           &lt;span class="c1"&gt;# 1 hour total workflow time&lt;/span&gt;
      &lt;span class="na"&gt;RetentionPeriodInDays&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;        &lt;span class="c1"&gt;# Keep checkpoint history 3 days&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two important things to notice here. &lt;code&gt;DurableConfig&lt;/code&gt; is what enables the feature and it cannot be added to existing functions, you have to create the function with it from the start. And &lt;code&gt;AutoPublishAlias: live&lt;/code&gt; gives us a qualified ARN. You should always invoke durable functions via a version or alias, never &lt;code&gt;$LATEST&lt;/code&gt;, because suspended executions need to replay against the same code that started them.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Main Handler - Orchestrating the Workflow
&lt;/h3&gt;

&lt;p&gt;Here is the core of the handler that ties everything together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@durable_execution&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DurableContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_progress_table&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;application_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Step 1: Validate Application
&lt;/span&gt;        &lt;span class="n"&gt;validated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;validate_application&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 2: Parallel Credit Bureau Checks
&lt;/span&gt;        &lt;span class="n"&gt;bureaus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;equifax&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transunion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;experian&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;credit_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parallel&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nf"&gt;pull_credit_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ssn_last4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bureaus&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;credit_reports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;credit_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_results&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 3: Risk Assessment
&lt;/span&gt;        &lt;span class="n"&gt;risk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;calculate_risk_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;credit_reports&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ssn_last4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;loan_amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;# Denied path
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;risk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;denied&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;denied&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Risk tier: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;risk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;risk_tier&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 4: Manager Approval (if &amp;gt;= $100,000)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;loan_amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;approval_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;submit_manager_approval&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;manager-approval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;WaitForCallbackConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;approval_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;denied&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Manager denied&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 5: External Fraud Check (Callback)
&lt;/span&gt;        &lt;span class="n"&gt;fraud_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;submit_fraud_check&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fraud-check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;WaitForCallbackConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 6: Generate Offer and Disburse
&lt;/span&gt;        &lt;span class="n"&gt;offer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;generate_loan_offer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;risk&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;disbursement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;disburse_funds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;offer&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;disbursement_ref&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;disbursement&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;disbursement_ref&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I want to call out a few things that I found really interesting while building this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parallel Execution
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;context.parallel()&lt;/code&gt; call runs three credit bureau checks concurrently. Each one is checkpointed independently. If two of the three bureaus succeed and the third fails on the first invocation, when Lambda replays only the failed one re-executes. The other two return their cached results instantly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;credit_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parallel&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nf"&gt;pull_credit_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ssn_last4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bureaus&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;credit_reports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;credit_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_results&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I have done parallel execution in Step Functions before using the Parallel state and it works well. But here it just feels more natural, it is a list comprehension that produces concurrent work. No separate branch definitions needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two Callback Patterns
&lt;/h3&gt;

&lt;p&gt;The demo uses &lt;code&gt;context.wait_for_callback()&lt;/code&gt; in two places and they demonstrate two different real-world patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manager Approval - Human-in-the-Loop&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For loans of $100K or more the workflow suspends and waits for a manager to approve. The setup function stores the callback ID in DynamoDB so the React frontend can read it and show an approval modal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;submit_manager_approval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;callback_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Store callback_id in DynamoDB so the frontend can send the approval.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;tbl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_progress_table&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;tbl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
        &lt;span class="n"&gt;UpdateExpression&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SET callback_id = :cid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ExpressionAttributeValues&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:cid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;callback_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;approval_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;submit_manager_approval&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;manager-approval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;WaitForCallbackConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_minutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the manager clicks approve in the React UI the API Lambda reads the callback ID from DynamoDB and calls &lt;code&gt;send_durable_execution_callback_success&lt;/code&gt; to resume the workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;lambda_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_durable_execution_callback_success&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;CallbackId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;callback_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;approved&lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Lambda that runs the durable workflow is completely idle during this time with zero compute cost. The manager could take 5 minutes or 5 hours but you only pay when the workflow actually resumes.&lt;/p&gt;

&lt;p&gt;I have done something similar with Step Functions task tokens in my pizza ordering project where the token was passed to a Fargate container to call back when processing was complete. The concept is the same but with durable functions the callback is part of the same Python file rather than requiring coordination between separate state machine definitions and external compute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fraud Check - External Service Callback&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The second callback demonstrates integrating with an external service. The workflow invokes a separate Lambda asynchronously and passes it the callback ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;submit_fraud_check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;callback_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_ctx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Invoke the external fraud check Lambda, passing the callback_id.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;fraud_lambda&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fraud_lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;FunctionName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FRAUD_CHECK_FUNCTION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;InvocationType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;callback_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;callback_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;applicant_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;applicant_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fraud check Lambda does its processing (simulated with a 5-second delay in the demo) and then sends the callback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# fraud_check.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;callback_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;callback_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Simulate processing
&lt;/span&gt;    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fraud_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;passed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk_indicators&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checked_by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FraudCheckService-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Resume the suspended durable execution
&lt;/span&gt;    &lt;span class="n"&gt;lambda_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_durable_execution_callback_success&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;CallbackId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;callback_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern works for any external system - a microservice, a webhook handler, a third-party API that calls you back. The important thing is that the calling system has the callback ID and the IAM permission to call &lt;code&gt;SendDurableExecutionCallbackSuccess&lt;/code&gt; on the durable function.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-Time Progress Tracking
&lt;/h3&gt;

&lt;p&gt;One thing I wanted in this demo was the ability to see the workflow progressing in real time from the frontend. The durable workflow writes progress entries to DynamoDB at each step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_progress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;application_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;info&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Append a log entry and update status in DynamoDB.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;log_entry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;application_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;UpdateExpression&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SET #logs = list_append(if_not_exists(#logs, :empty_list), :new_log), &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current_step = :step, #status = :status, updated_at = :ts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="c1"&gt;# ...
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The React frontend polls &lt;code&gt;GET /status/{applicationId}&lt;/code&gt; every couple of seconds and updates the UI as each step completes. There is nothing durable-function-specific about this, it is just polling a DynamoDB table. But it makes the demo much more compelling because you can watch the workflow progress through each checkpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handling Replays
&lt;/h3&gt;

&lt;p&gt;One subtle issue I had to deal with was replay detection for the progress logging. Remember that when the durable function resumes after a callback the handler re-executes from the top and completed steps return their cached results. But the &lt;code&gt;log_progress&lt;/code&gt; calls between steps are not inside steps, they are regular Python code that runs again on every replay.&lt;/p&gt;

&lt;p&gt;I solved this with a counter-based approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;
&lt;span class="n"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;application_id&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Item&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
&lt;span class="n"&gt;prior_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]))&lt;/span&gt;
&lt;span class="n"&gt;call_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;info&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;call_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;call_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;prior_counts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[REPLAY] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;replay&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;log_progress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;application_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This compares how many times each step has been logged before versus how many times we have called log in the current invocation. If we have already logged that many entries for a step it must be a replay and we tag it accordingly. This is a good example of the kind of thing you need to think about with the replay model that you don't have to worry about with Step Functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The SAM Template
&lt;/h2&gt;

&lt;p&gt;The full SAM template sets up everything: a DynamoDB table, three Lambda functions, API Gateway with CORS, and all the IAM permissions. Here is the key section for the durable workflow function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;LoanWorkflowFunction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless::Function&lt;/span&gt;
  &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;FunctionName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LoanWorkflowFunction&lt;/span&gt;
    &lt;span class="na"&gt;CodeUri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;src/&lt;/span&gt;
    &lt;span class="na"&gt;Handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;loan_demo.lambda_handler&lt;/span&gt;
    &lt;span class="na"&gt;Timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;
    &lt;span class="na"&gt;AutoPublishAlias&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;live&lt;/span&gt;
    &lt;span class="na"&gt;DurableConfig&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;ExecutionTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;
      &lt;span class="na"&gt;RetentionPeriodInDays&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
    &lt;span class="na"&gt;Environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;Variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;PROGRESS_TABLE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;LoanProgressTable&lt;/span&gt;
        &lt;span class="na"&gt;FRAUD_CHECK_FUNCTION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;FraudCheckFunction&lt;/span&gt;
    &lt;span class="na"&gt;Policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;AWSLambdaBasicExecutionRole&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Statement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
            &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;lambda:ManageDurableState&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;lambda:GetDurableExecution&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;lambda:ListDurableExecutions&lt;/span&gt;
            &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:LoanWorkflowFunction*"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;DynamoDBCrudPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;TableName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;LoanProgressTable&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Statement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
            &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;lambda:InvokeFunction&lt;/span&gt;
            &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;FraudCheckFunction.Arn&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the &lt;code&gt;lambda:ManageDurableState&lt;/code&gt; permission - the function needs this to manage its own checkpoint data. The fraud check function needs a separate permission, &lt;code&gt;lambda:SendDurableExecutionCallbackSuccess&lt;/code&gt;, to send the callback that resumes the workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Durable Functions vs Step Functions - When Each Wins
&lt;/h2&gt;

&lt;p&gt;After building this project and spending time with both services I have a much clearer picture of when each one makes sense. Here is how I think about it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose Durable Functions When:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Your workflow is code-centric business logic.&lt;/strong&gt; If the interesting part is the Python code between steps: validation rules, scoring algorithms, data transformations, durable functions keep everything in one place.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need third-party libraries.&lt;/strong&gt; In this demo I used AWS Lambda Powertools for logging, tracing, and metrics. In a real-world scenario you might need pandas for data processing, the Stripe SDK for payments, or an AI SDK for model calls. With durable functions you just add them to &lt;code&gt;requirements.txt&lt;/code&gt;. In Step Functions each dependency would need its own Lambda function.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You have complex branching and looping.&lt;/strong&gt; An AI agent that chains LLM calls, branches on model output, and retries on rate limits is just a while loop in Python. Expressing that as an ASL state machine means fighting the declarative model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want local testing.&lt;/strong&gt; The durable execution testing SDK simulates the checkpoint engine in-process with pytest. No Docker containers, no local Lambda emulator, no AWS credentials needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Choose Step Functions When:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You are orchestrating across native AWS services.&lt;/strong&gt; If your workflow is Glue crawler → Athena query → S3 export → SNS notification with zero custom logic between steps, Step Functions calls those services directly without Lambda. You can't beat zero-compute orchestration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual observability matters.&lt;/strong&gt; When the ETL pipeline fails at 2 AM, the on-call engineer opens the Step Functions console and sees one red box. They click it and see the error. No log searching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You have compliance requirements.&lt;/strong&gt; SOC 2 auditors can review a visual state machine. They can see every step, every permission, every error path without reading Python code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The workflow is mostly waiting.&lt;/strong&gt; A notification workflow that waits 30 days, checks something, and sends an email, five state transitions over a month, costs essentially nothing in Step Functions. No Lambda overhead during the waits.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Hybrid Pattern
&lt;/h3&gt;

&lt;p&gt;Worth mentioning that the best architecture often uses both. Step Functions for the macro-orchestration (routing, cross-service coordination, audit trail) and durable functions for the micro-orchestration (complex business logic inside individual steps). Step Functions calls a Lambda, that Lambda happens to be a durable function with its own internal workflow. Best of both worlds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to Watch Out For
&lt;/h2&gt;

&lt;p&gt;There are a few key things to watch out for that I ran into or learned about while building this that are worth being aware of.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replay demands determinism.&lt;/strong&gt; Since your code runs multiple times you cannot branch on things like random numbers, timestamps, or any mutable external state outside of steps. Everything non-deterministic must go inside a &lt;code&gt;@durable_step&lt;/code&gt;. In my credit bureau check I generate scores using a seeded random number generator based on the SSN, the seed ensures the same score every time for the same input which keeps the replay deterministic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 15-minute invocation limit still applies.&lt;/strong&gt; Each individual replay invocation has the standard Lambda timeout. Durable functions do not change that. What changes is the total execution can span many invocations over up to a year. Use waits between long steps to reset the window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version pinning is essential.&lt;/strong&gt; If your code changes while an execution is suspended for three days waiting for a manager approval, the replay will run different code than the original. The &lt;code&gt;AutoPublishAlias: live&lt;/code&gt; in the SAM template handles this by always creating a new version on deploy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability is still maturing.&lt;/strong&gt; Logging across replays can be confusing, you see the same log lines from both the original execution and the replay. The Durable Executions tab in the Lambda console is helpful but there is no jump-to-logs feature yet. I recommend investing in structured logging from day one with something like AWS Lambda Powertools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is a new service.&lt;/strong&gt; Launched December 2025. You cannot convert existing functions. The SDKs are evolving. Pin your SDK versions and bundle them with your deployment package.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;

&lt;p&gt;If you want to try this out for yourself the full source code is available on GitHub: &lt;a href="https://github.com/RDarrylR/lambda-durable-demo" rel="noopener noreferrer"&gt;https://github.com/RDarrylR/lambda-durable-demo&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repo&lt;/span&gt;
git clone https://github.com/RDarrylR/lambda-durable-demo
&lt;span class="nb"&gt;cd &lt;/span&gt;lambda-durable-demo

&lt;span class="c"&gt;# Build and deploy the backend&lt;/span&gt;
sam build
sam deploy &lt;span class="nt"&gt;--guided&lt;/span&gt;   &lt;span class="c"&gt;# first time&lt;/span&gt;

&lt;span class="c"&gt;# Note the LoanApiUrl output, then setup the frontend&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;frontend
npm &lt;span class="nb"&gt;install
echo&lt;/span&gt; &lt;span class="s2"&gt;"VITE_API_URL=https://YOUR-API-ID.execute-api.us-east-1.amazonaws.com"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .env
npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The frontend runs at &lt;code&gt;http://localhost:5173&lt;/code&gt;. Click one of the demo profiles (Alice, Bob, or Charlie), submit the application, and watch the workflow progress in real time. For Alice's $150K loan you will see the manager approval modal pop up, approve it and watch the rest of the workflow complete through the fraud check and disbursement.&lt;/p&gt;

&lt;p&gt;I have lots of plans to expand on this in the future including adding automated tests with the durable execution testing SDK and exploring the hybrid pattern with Step Functions.&lt;/p&gt;

&lt;p&gt;For more articles from me please visit my blog at &lt;a href="https://darryl-ruggles.cloud/" rel="noopener noreferrer"&gt;Darryl's World of Cloud&lt;/a&gt; or find me on &lt;a href="https://bsky.app/profile/darryl-ruggles.cloud" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;, &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;, &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;, or the &lt;a href="https://community.aws/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For tons of great serverless content and discussions please join the Believe In Serverless community we have put together at this link: &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless Community&lt;/a&gt;&lt;/p&gt;

</description>
      <category>lambda</category>
      <category>aws</category>
      <category>serverless</category>
      <category>stepfunctions</category>
    </item>
    <item>
      <title>Building a Serverless Sales Analytics Platform with AI Insights for Under $10/Month</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Sat, 29 Nov 2025 15:34:26 +0000</pubDate>
      <link>https://forem.com/aws-builders/building-a-serverless-sales-analytics-platform-with-ai-insights-for-under-10month-49g9</link>
      <guid>https://forem.com/aws-builders/building-a-serverless-sales-analytics-platform-with-ai-insights-for-under-10month-49g9</guid>
      <description>&lt;p&gt;I have a number of projects I’ve been working on that are not finished yet but this is one I wanted to complete before I finished packing and got on the plane to attend &lt;strong&gt;AWS Re:Invent&lt;/strong&gt; in Las Vegas this year. I'm hoping to pick up new techniques and meet many other people who build event-driven architectures every day to learn from them.&lt;/p&gt;

&lt;p&gt;I see so many great examples of using the managed and serverless services that cloud providers like AWS offer. Being able to build such a complete solution that costs less than $10 a month to run is a common occurrence with these types of builds. You can examine your requirements and budget for any given project and choose from the many tools that are available to use with just an API call and only get charged based on how much you use them.&lt;/p&gt;

&lt;p&gt;You can try this project out for yourself by checking out the code found in my Github repo here → &lt;a href="https://github.com/RDarrylR/serverless-smurf-etl-and-analytics" rel="noopener noreferrer"&gt;Github Repo&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;Smurf Memorabilia Inc., is a fictional retail chain with multiple store locations and it needs a way to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Collect daily sales data from each store location&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Transform and store that data efficiently&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generate AI-powered business insights&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Visualize results in dashboards&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key requirements include: &lt;strong&gt;low cost&lt;/strong&gt;, &lt;strong&gt;minimal operational overhead&lt;/strong&gt;, and &lt;strong&gt;pay only for what you use&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Stores will upload their sales data each day in an agreed format. The data will be processed and analysis will be done. Analytics data will be updated and AI-based recommendations will be made. Key people will receive daily emails or SMS messages of what is happening.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: 100% Serverless Architecture
&lt;/h2&gt;

&lt;p&gt;My solution involves an event-driven ETL platform using managed AWS services. There are no servers to patch, no capacity to plan, and no minimum fees. You pay only when data flows through the system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fajnuljgme4fkyxhe7v66.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fajnuljgme4fkyxhe7v66.png" alt="High-level architecture" width="800" height="926"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Services Used
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Service&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Role&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Pricing Model&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AWS Lambda&lt;/td&gt;
&lt;td&gt;All compute (17 functions)&lt;/td&gt;
&lt;td&gt;Per invocation + duration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;Object storage&lt;/td&gt;
&lt;td&gt;Per GB stored + requests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DynamoDB&lt;/td&gt;
&lt;td&gt;Metrics database&lt;/td&gt;
&lt;td&gt;Per read/write unit (on-demand)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Step Functions&lt;/td&gt;
&lt;td&gt;Workflow orchestration&lt;/td&gt;
&lt;td&gt;Per state transition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EventBridge&lt;/td&gt;
&lt;td&gt;Event routing&lt;/td&gt;
&lt;td&gt;Free tier covers most use cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bedrock&lt;/td&gt;
&lt;td&gt;AI analysis (Nova Lite)&lt;/td&gt;
&lt;td&gt;Per token processed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Gateway&lt;/td&gt;
&lt;td&gt;REST API&lt;/td&gt;
&lt;td&gt;Per request&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SNS&lt;/td&gt;
&lt;td&gt;Notifications&lt;/td&gt;
&lt;td&gt;Per message&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are a few of the managed/serverless offerings from AWS. You can piece together as many of these as you need to build your architecture. These scale automatically from zero to whatever capacity you need.&lt;/p&gt;




&lt;h2&gt;
  
  
  Smart Data Storage with Apache Parquet
&lt;/h2&gt;

&lt;p&gt;One of the key architectural decisions was converting the raw uploaded JSON sales data into &lt;strong&gt;Apache Parquet&lt;/strong&gt; format. This columnar storage format delivers significant benefits:&lt;/p&gt;

&lt;h3&gt;
  
  
  Huge Compression
&lt;/h3&gt;

&lt;p&gt;Our 30-day dataset comparison:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Raw JSON uploads&lt;/strong&gt;: 53.1 MB&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Parquet files&lt;/strong&gt;: 4.7 MB&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The examples I have get an &lt;strong&gt;11x reduction&lt;/strong&gt; in size using the default parquet compression algorithm but it can be changed to use even higher compression if needed. This results in great savings for storage and faster query performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Parquet?
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Columnar Storage&lt;/strong&gt;: Only reads the columns you need, not entire rows&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Built-in Compression&lt;/strong&gt;: Uses efficient encoding (dictionary, run-length, delta)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Schema Enforcement&lt;/strong&gt;: Explicit types prevent data quality issues&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ecosystem Support&lt;/strong&gt;: Works with Athena, Spark, Pandas, and most analytics tools&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Type-Safe Schema
&lt;/h3&gt;

&lt;p&gt;We define an explicit PyArrow schema to ensure data quality. We want to make sure we keep track of which Smurf loot is popular every day and follow the trends.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;PARQUET_SCHEMA&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transaction_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transaction_timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;item_sku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;item_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;int32&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unit_price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decimal128&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line_total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decimal128&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;discount_amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decimal128&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payment_method&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This schema ensures that decimal precision is maintained (critical for financial data) and timestamps are properly typed for time-series analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hive-Style Partitioning for Efficient Queries
&lt;/h2&gt;

&lt;p&gt;Raw uploads arrive with flat filenames like &lt;code&gt;store_0001_2025-11-27.json&lt;/code&gt;. We transform these into a &lt;strong&gt;Hive-style partition structure&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;s&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;://bucket/processed/&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;year=&lt;/span&gt;&lt;span class="mi"&gt;2025&lt;/span&gt;&lt;span class="err"&gt;/&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;month=&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="err"&gt;/&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;day=&lt;/span&gt;&lt;span class="mi"&gt;27&lt;/span&gt;&lt;span class="err"&gt;/&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;store_id=&lt;/span&gt;&lt;span class="mi"&gt;0001&lt;/span&gt;&lt;span class="err"&gt;/data.parquet&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;store_id=&lt;/span&gt;&lt;span class="mi"&gt;0002&lt;/span&gt;&lt;span class="err"&gt;/data.parquet&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;day=&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="err"&gt;/&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why This Structure Matters
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Partition Pruning&lt;/strong&gt;: When you query "all sales for November 2025", tools like Amazon Athena only scan files in &lt;code&gt;year=2025/month=11/&lt;/code&gt; - not the entire dataset. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Faster queries&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lower costs (Athena charges per TB scanned)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Better organization&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Transformation Code&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Parse: store_0001_2025-11-27.json
&lt;/span&gt;&lt;span class="n"&gt;store_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;day&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_filename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Output: year=2025/month=11/day=27/store_id=0001/data.parquet
&lt;/span&gt;&lt;span class="n"&gt;output_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processed/year=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/month=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/day=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;day&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/store_id=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;store_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/data.parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This simple transformation enables sophisticated analytics without complex ETL pipelines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Analytics Options (Web-based and a more standard Business Intelligence approach)
&lt;/h2&gt;

&lt;p&gt;I wanted to show how you could use multiple approaches to analyze the sales data. We need to use the best approaches to keep track of those 3-apple tall blue creatures and all the ways their fans want to remember them. One in a more simple web version that is built in ReactJS and runs in your browser. I also built a prototype version of Amazon Quick Suite dashboards. Depending on the audience one of these approaches will likely work (or you could build something else.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: React Dashboard (Developer-Friendly)
&lt;/h3&gt;

&lt;p&gt;The project includes a custom ReactJS application that queries the API directly:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgfvzzzl38sgttrkauhkx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgfvzzzl38sgttrkauhkx.png" alt="ReactJS web-based analytics" width="800" height="908"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Web-based analytics approach is likely best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Custom visualizations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Embedding in existing applications&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provides Full control over the user experience&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;No additional licensing costs&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The React dashboard provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Real-time metrics display&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;File upload interface with drag-and-drop&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Historical trend charts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AI-generated insights and recommendations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23um5ifbujhyfp8vtf6l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23um5ifbujhyfp8vtf6l.png" alt="Top selling products in web-based view" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwc43tuv0w44a9qy4lnt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwc43tuv0w44a9qy4lnt.png" alt="Ai-based recommendations in web-based display" width="800" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: Amazon Quick Suite (Business-Friendly)
&lt;/h3&gt;

&lt;p&gt;This approach offers a managed Business Intelligence (BI) service that imports data from S3:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dljkiecsi5ecskoxcip.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dljkiecsi5ecskoxcip.png" alt="Quick Suite Analytics" width="800" height="696"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Quick Suite approach is likely best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Business users who need self-service analytics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ad-hoc exploration without writing code&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sharing dashboards with stakeholders&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Built-in visualizations (no frontend development)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The current project exports five datasets to S3 in newline-delimited JSON format:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Store summaries (daily metrics per store)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Top products (best sellers)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Anomalies (AI-detected unusual patterns)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Trends (week-over-week analysis)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Recommendations (AI-generated action items)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quick Suite's SPICE engine imports this data for fast, interactive dashboards.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choosing between the analytics approach to use:
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Factor&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;React Dashboard&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Quick Suite&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Included (API calls only)&lt;/td&gt;
&lt;td&gt;$24/month per author, $3/month per reader&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup&lt;/td&gt;
&lt;td&gt;Requires development&lt;/td&gt;
&lt;td&gt;Point-and-click&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customization&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Template-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User Type&lt;/td&gt;
&lt;td&gt;Developers&lt;/td&gt;
&lt;td&gt;Business analysts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding&lt;/td&gt;
&lt;td&gt;Full control&lt;/td&gt;
&lt;td&gt;Quick Suite embedding&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Many organizations could use both: ReactJS for customer-facing features, Quick Suite for internal analytics.&lt;/p&gt;




&lt;h2&gt;
  
  
  Event-Driven Processing
&lt;/h2&gt;

&lt;p&gt;The platform uses an event-driven architecture where each component reacts to events rather than polling for work. I always try to use this type of architecture unless the use-case really doesn’t fit it. AWS Step Functions are used to drive the data upload processing as well as the recommendation and analytics flow handling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Upload Processing Flow
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmgannb0z13ru4e8itmg9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmgannb0z13ru4e8itmg9.png" alt="Step function for upload processing" width="706" height="1048"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Store uploads JSON file to S3 (via presigned URL)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;S3 emits &lt;code&gt;Object Created&lt;/code&gt; event&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;EventBridge routes event to Step Functions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Step Functions orchestrates the processing pipeline:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;* Validate schema

* Convert to Parquet

* Calculate metrics

* Store in DynamoDB

* Check if all stores reported
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Daily Analysis Trigger
&lt;/h3&gt;

&lt;p&gt;When the last store uploads for a day, the system automatically triggers a smurfy comprehensive analysis:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbb9hlf2shms6ohk3933n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbb9hlf2shms6ohk3933n.png" alt="Handle daily analysis flow" width="800" height="2313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The analysis runs exactly when the data is ready. But what if a store fails to report? A scheduled EventBridge rule runs at 11 PM local time as a fallback, ensuring you always get a daily report - even with partial data. The scheduler checks if analysis already ran for that day and skips if so.&lt;/p&gt;

&lt;p&gt;If invalid data is uploaded, the key stakeholders will receive email or SNS notifications to follow up with users. If the processing flow fails on the first attempt it has built-in retry and backoff mechanisms.&lt;/p&gt;
&lt;h3&gt;
  
  
  Daily Email Reports
&lt;/h3&gt;

&lt;p&gt;Once analysis completes, the platform automatically sends a daily summary email via SNS containing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Total revenue across all stores&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Top performing store of the day&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AI-detected anomalies and unusual patterns&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Business recommendations from Bedrock&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stakeholders receive insights in their inbox without logging into any dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbzvlbfwtojjq0g1c1kie.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbzvlbfwtojjq0g1c1kie.png" alt="Daily email of status" width="800" height="1003"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  AI-Powered Insights with Amazon Bedrock
&lt;/h2&gt;

&lt;p&gt;The solution uses Amazon Bedrock with the Nova Lite model (configurable to whatever model you want) to generate business intelligence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Anomaly Detection&lt;/strong&gt;: Identifies stores with unusual revenue patterns&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Trend Analysis&lt;/strong&gt;: Compares current performance to historical baselines&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Recommendations&lt;/strong&gt;: Generates actionable business advice&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bedrock is pay-per-token with no minimum commitment - so it’s perfect for batch processing workloads.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Cost Breakdown
&lt;/h2&gt;

&lt;p&gt;Here's what this platform actually costs for a typical month (e.g., 330 file uploads = multiple stores × 30 days):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Service&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Monthly Cost&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lambda&lt;/td&gt;
&lt;td&gt;~$2.00&lt;/td&gt;
&lt;td&gt;17 functions, ~1000 invocations each&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Step Functions&lt;/td&gt;
&lt;td&gt;~$0.50&lt;/td&gt;
&lt;td&gt;360 workflow executions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DynamoDB&lt;/td&gt;
&lt;td&gt;~$1.00&lt;/td&gt;
&lt;td&gt;On-demand mode, ~1000 ops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;~$0.01&lt;/td&gt;
&lt;td&gt;~60 MB stored&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bedrock&lt;/td&gt;
&lt;td&gt;~$5.00&lt;/td&gt;
&lt;td&gt;Nova Lite, 30 daily analyses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EventBridge&lt;/td&gt;
&lt;td&gt;~$0.00&lt;/td&gt;
&lt;td&gt;Free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SNS&lt;/td&gt;
&lt;td&gt;~$0.10&lt;/td&gt;
&lt;td&gt;Email notifications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch Alarms&lt;/td&gt;
&lt;td&gt;~$0.00&lt;/td&gt;
&lt;td&gt;7 alarms (first 10 free)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$8.61&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Add Quick Suite (if needed) for $24/month per author to build dashboards, or just $3/month per reader for view-only access.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why is this all so cheap?
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ARM64 Architecture&lt;/strong&gt;: Lambda on Graviton2 is ~20% cheaper than x86&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Parquet Compression&lt;/strong&gt;: ~ 11x less storage than JSON&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;On-Demand DynamoDB&lt;/strong&gt;: Pay only for actual read/write operations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Event-Driven&lt;/strong&gt;: No idle compute costs&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  Infrastructure as Code (IaC)
&lt;/h2&gt;

&lt;p&gt;I’m a big advocate of using IaC for everything. My favourite tools for this are Terraform, the Serverless Application Model (SAM), and the Cloud Development Kit (CDK). In this case there is VPC provisioning and a lot of resources so I chose my go-to tool Terraform. One command deploys everything with Terraform:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here are some key snippets from the infrastructure code:&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda Functions (ARM64 for Cost Savings)
&lt;/h3&gt;

&lt;p&gt;Lambda is best place to host your business logic when code execution times are short. All 17 Lambda functions use ARM64 architecture (Graviton2) for ~ 20% cost savings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;resource&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aws_lambda_function"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"process_upload"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;filename&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;data.archive_file.process_upload_zip.output_path&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;function_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"process_upload"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;role&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_iam_role.lambda_role.arn&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;handler&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"process_upload.lambda_handler"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;runtime&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python3.13"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;architectures&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"arm64"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;timeout&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;memory_size&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;layers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;local.powertools_layer_arn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;local.pandas_layer_arn&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;tracing_config&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Active"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;environment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;variables&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;merge(local.powertools_env_vars&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;S&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;_BUCKET&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_s&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;_bucket.upload_bucket.id&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;PROCESSED_PREFIX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;var.processed_prefix&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  DynamoDB (Pay-Per-Request)
&lt;/h3&gt;

&lt;p&gt;DynamoDB is my favourite database to use with AWS. It is truly serverless and tables are ready to use in seconds. It offers on-demand billing which means zero compute cost when idle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;resource&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aws_dynamodb_table"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sales_data"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;name&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SalesData"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;billing_mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PAY_PER_REQUEST"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;hash_key&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PK"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;range_key&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SK"&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;attribute&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PK"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"S"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;attribute&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SK"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"S"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;GSI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;querying&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;date&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;across&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;stores&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;global_secondary_index&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;name&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GSI1"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;hash_key&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GSI1PK"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;range_key&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GSI1SK"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;projection_type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ALL"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  EventBridge (S3 to Step Functions)
&lt;/h3&gt;

&lt;p&gt;Eventbridge is my favourite AWS service. It offers rules for reacting to events, pipes for bridging data across AWS services, and a nice scheduler. Here i’m using a simple rule that routes S3 uploads to the processing workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;resource&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aws_cloudwatch_event_rule"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"s3_upload"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;name&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"capture-s3-uploads"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;description&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Capture all S3 object uploads"&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;event_pattern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;jsonencode(&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;source&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"aws.s3"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;detail-type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Object Created"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;detail&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;bucket&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;aws_s&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;_bucket.upload_bucket.id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;object&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="err"&gt;key&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;prefix&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;var.upload_prefix&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;resource&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aws_cloudwatch_event_target"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"step_function"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;rule&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_cloudwatch_event_rule.s&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;_upload.name&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;target_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"UploadProcessorStepFunction"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;arn&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_sfn_state_machine.upload_processor.arn&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;role_arn&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_iam_role.eventbridge_step_function_role.arn&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step Functions (Workflow Orchestration)
&lt;/h3&gt;

&lt;p&gt;In many cases you want to tightly control and track the flow of processing in your app. AWS Step Function state machines are defined as JSON templates with Lambda ARNs injected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;resource&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aws_sfn_state_machine"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"upload_processor"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;name&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"upload-processor"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;role_arn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_iam_role.step_function_role.arn&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;definition&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;templatefile(&lt;/span&gt;&lt;span class="s2"&gt;"${path.module}/../backend/state-machines/upload-processor.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;process_upload_lambda_arn&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_lambda_function.process_upload.arn&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;calculate_metrics_lambda_arn&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_lambda_function.calculate_metrics.arn&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;write_metrics_lambda_arn&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_lambda_function.write_metrics.arn&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;check_all_stores_lambda_arn&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_lambda_function.check_all_stores.arn&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;sns_alerts_topic_arn&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_sns_topic.sales_alerts.arn&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;daily_analysis_state_machine_arn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_sfn_state_machine.daily_analysis.arn&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  S3 Bucket (Secure by Default)
&lt;/h3&gt;

&lt;p&gt;S3 is at the core of storing data for so many apps today. My setup has Public access blocked, encryption enabled, and EventBridge notifications on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;resource&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aws_s3_bucket_public_access_block"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"upload_bucket_public_access_block"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;bucket&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_s&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;_bucket.upload_bucket.id&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;block_public_acls&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;block_public_policy&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;ignore_public_acls&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;restrict_public_buckets&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;resource&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aws_s3_bucket_notification"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bucket_notification"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;bucket&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;aws_s&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;_bucket.upload_bucket.id&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;eventbridge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The complete infrastructure includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;17 Lambda functions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2 Step Functions state machines&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;API Gateway with 5 endpoints&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;DynamoDB table with GSI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;S3 bucket with security policies&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;EventBridge rules&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SNS topics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;IAM roles with least-privilege policies&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To set all this up there is no clicking through console pages and no manual configuration drift.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Serverless doesn't mean simple&lt;/strong&gt; - it means you focus on business logic instead of infrastructure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Parquet is worth the conversion&lt;/strong&gt; - the great compression pays for itself in storage and query costs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hive partitioning enables scale&lt;/strong&gt; - organize data for how it will be queried, not how it arrives.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Event-driven beats polling&lt;/strong&gt; - let AWS route events instead of writing schedulers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pay-as-you-go works&lt;/strong&gt; - for variable workloads, managed services beat reserved capacity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Offer analytics options&lt;/strong&gt; - different users have different needs; support both custom dashboards and BI tools.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The complete source code for my solution is available on GitHub, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Terraform infrastructure definitions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;17 Lambda functions (Python 3.13)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;React frontend application&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sample data generator&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Quick Suite setup scripts&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Deploy your own instance and start processing data in under 30 minutes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with AWS Lambda, Step Functions, S3, DynamoDB, EventBridge, Bedrock, API Gateway, SNS, and optionally Quick Suite.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  CLEANUP (IMPORTANT!!)
&lt;/h2&gt;

&lt;p&gt;If you do end up deploying this yourself please understand some of the included resources will cost you a small amount of real money. Please don’t forget about it.&lt;/p&gt;

&lt;p&gt;Please &lt;strong&gt;MAKE SURE TO DELETE&lt;/strong&gt; the stack if you are no longer using it. Running &lt;code&gt;terraform destroy&lt;/code&gt; can take care of this or you can delete the server in the AWS console.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Try the setup in your AWS account&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You can clone the &lt;a href="https://github.com/RDarrylR/serverless-smurf-etl-and-analytics" rel="noopener noreferrer"&gt;Github Repo&lt;/a&gt; and try this out in your own AWS account. The README.md file mentions any changes you need to make for it to work in your AWS account.&lt;/p&gt;

&lt;p&gt;Please let me know if you have any suggestions or problems trying out this example project.&lt;/p&gt;

&lt;p&gt;For more articles from me please visit my blog at &lt;a href="https://darryl-ruggles.cloud/" rel="noopener noreferrer"&gt;&lt;strong&gt;Darryl's World of Cloud&lt;/strong&gt;&lt;/a&gt; or find me on &lt;a href="https://bsky.app/profile/darryl-ruggles.cloud" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;, &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;&lt;strong&gt;X&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;&lt;strong&gt;LinkedIn&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;&lt;strong&gt;Medium&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://dev.to/rdarrylr"&gt;&lt;strong&gt;Dev.to&lt;/strong&gt;&lt;/a&gt;, or the &lt;a href="https://builder.aws.com/community/@darrylr" rel="noopener noreferrer"&gt;&lt;strong&gt;AWS Community&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For tons of great serverless content and discussions please join the Believe In Serverless community we have put together at this link: &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Believe In Serverless Community&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>terraform</category>
      <category>lambda</category>
      <category>stepfunctions</category>
    </item>
    <item>
      <title>Aurora DSQL - Build A Serverless Multi-Region E-Commerce Platform</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Tue, 18 Nov 2025 05:11:45 +0000</pubDate>
      <link>https://forem.com/aws-builders/aurora-dsql-build-a-serverless-multi-region-e-commerce-platform-i62</link>
      <guid>https://forem.com/aws-builders/aurora-dsql-build-a-serverless-multi-region-e-commerce-platform-i62</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I’ve always been a big fan of managed and truly serverless services offered by public cloud providers like AWS. I want to be able to prototype and build applications with as little infrastructure handling and management as possible. My time should be spent focusing on the business logic of the problem at hand.&lt;/p&gt;

&lt;p&gt;I really like using services on AWS like Lambda, API Gateway, the Simple Queue Service (SQS), Simple Notification Service (SNS), and many others. For a database platform I have almost always used DynamoDB. DynamoDB tables can be provisioned and ready to use in seconds. The service is highly performant at any scale and you just pay for what you use. I don’t have to pay $100’s per month for something I may only use once a week.&lt;/p&gt;

&lt;p&gt;In recent times we’ve had a resurgence of interest in SQL-based databases. Of course we all learned about databases with these but I’ve typically avoided them whenever possible due to all the setup and management required and how long it takes to start using these after you create them. I know the interface to and API for DynamoDB can be rather cryptic and difficult to get used to but it’s a very serverless offering - so right up my alley.&lt;/p&gt;

&lt;p&gt;When AWS announced Aurora DSQL at re:Invent 2024, it really struck a chord with me and seemed to give me another big option to take advantage of. I was honestly quite disappointed when I started reading details about much of the functionality I was used to with SQL databases not being supported in DSQL. As time has passed I have understood why the DSQL team made many of the choices they did to get the performance and consistency they want. I have read a lot of articles and watched videos from Marc Brooker (&lt;a href="https://brooker.co.za/blog/" rel="noopener noreferrer"&gt;Marc’s Blog&lt;/a&gt;) and others (&lt;a href="https://aws-news.com/?types=blog&amp;amp;searchTerm=dsql" rel="noopener noreferrer"&gt;AWS DSQL Blogs&lt;/a&gt;) and appreciate the work that went into DSQL and it’s innovative design.&lt;/p&gt;

&lt;p&gt;Aurora DSQL is a multi-region distributed SQL database that provisions in under 60 seconds and bills only for actual usage. No instances to size, no standby replicas to pay for when you're not using them, and multi-region replication is built-in when you need it. Currently multi-region support only allow pairs of AWS regions in the same general parts of the world but the team is working on supporting more cases like pairs of regions much farther apart (think one in the US and one in Europe or something like that) and possible support for things like CDC (Change Data Capture) where changes are streamed via an interface. Aurora DSQL ensures that all reads and writes to any Regional endpoint are strongly consistent and durable. This is very tough to accomplish in a very fast and scalable fashion like they have accomplished.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Kabob Store
&lt;/h2&gt;

&lt;p&gt;I wanted to build a demo project (&lt;a href="https://github.com/RDarrylR/dsql-kabob-store" rel="noopener noreferrer"&gt;Github repo here&lt;/a&gt;) that I could expand on in later blogs and code repos. I have chosen to build “The Kabob Store” to start working with Aurora DSQL along with other AWS services. Who doesn’t like Kabobs and tasty Baclava anyway? This e-commerce platform is the start of my future Kabob empire but for now it’s a practical test: a fully functional e-commerce platform with menu browsing, cart management, order placement, and order history. It uses Aurora DSQL for data storage, Elastic Container Service (ECS) with Fargate for compute, and demonstrates whether DSQL can replace DynamoDB as the default choice for serverless applications that need relational data.&lt;/p&gt;

&lt;p&gt;In the past I typically focused on using serverless compute via AWS Lambda for most projects. I think most people have come to the realization that there are many ways to solves problems and sticking to the same one for everything is not the best approach. I have spent a lot of time working with containers over the years - be it in local Kubernetes installs setup via kubeadm, cloud provider Kubernetes cluster like the Elastic Kubernetes Service (EKS) on AWS, and the Elastic Container Service (ECS) on AWS. I have seen that these work really well for many use cases.&lt;/p&gt;

&lt;p&gt;I see the job of a solution architect as taking the requirements given for any problem and the boundaries set to go and sort through the vast set of available tools and platforms and build a solution that best meets the goals and budget. This doesn’t always mean using your favourite approaches and tools. For me this was almost always to use AWS serverless tools and event-driven architectures in the past. In the last few years I have been spending a lot more time mixing in things like container-based solutions, simple VM setups, and almost any approach that gets the job done.&lt;/p&gt;

&lt;p&gt;I think one of the keys to allowing this flexibility is structuring business logic code so that it doesn’t know (or care) much about where it’s running and typically should not directly interact with most of the surrounding infrastructure. If you can setup your projects in this way it should be quite easy to move from running in AWS Lambda to running in Fargate on ECS to running straight is some VM.&lt;/p&gt;

&lt;p&gt;The Kabob Store I present here is a full stack solution that includes a ReactJS front end. It’s not using all the latest front end tech - more a plain React Single Page App (SPA). I am more of a backend developer but did teach myself ReactJS a number of years ago and have built a few front end apps when needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrk8dj91x3hyei5ideds.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrk8dj91x3hyei5ideds.png" alt="Kebob Store Order Page" width="800" height="581"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Kabob Store Architecture
&lt;/h2&gt;

&lt;p&gt;The Kabob Store uses containers on ECS Fargate rather than Lambda functions. This deserves explanation since I typically default to Lambda for serverless compute.&lt;/p&gt;

&lt;p&gt;┌─────────────┐     ┌─────────────┐     ┌─────────────┐&lt;br&gt;
│   React     │────▶│     ALB     │────▶│   FastAPI   │&lt;br&gt;
│   Frontend  │     │   (Route)   │     │   Backend   │&lt;br&gt;
└─────────────┘     └─────────────┘     └─────────────┘&lt;br&gt;
       │                                        │&lt;br&gt;
       └──────── ECS Fargate ──────────────────┤&lt;br&gt;
                                               │&lt;br&gt;
                                        ┌──────▼──────┐&lt;br&gt;
                                        │Aurora DSQL  │&lt;br&gt;
                                        │(Multi-Region)│&lt;br&gt;
                                        └─────────────┘&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Containers Instead of Lambda?
&lt;/h2&gt;

&lt;p&gt;For this project, containers provide flexibility. The FastAPI application runs in a container that could deploy to multiple runtimes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fufprszs6thf1jirp4m1h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fufprszs6thf1jirp4m1h.png" alt="Runtime choice" width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The business logic doesn't care about the runtime. With minimal adapter code, the same application can deploy across all these platforms. This matters because project requirements change:&lt;/p&gt;

&lt;p&gt;Development/staging: Fargate's simplicity wins (no servers to manage)&lt;/p&gt;

&lt;p&gt;Production at scale: ECS on EC2 becomes more cost-effective (Fargate pricing is roughly 20-30% higher than equivalent EC2)&lt;/p&gt;

&lt;p&gt;Lambda: Works well for this workload, but has 15-minute timeout limits and specific deployment constraints&lt;/p&gt;

&lt;p&gt;For the Kabob Store, I chose Fargate for operational simplicity during development. If traffic scales significantly, migrating to ECS on EC2 workers requires no code changes. Just Terraform adjustments to swap Fargate launch type for EC2 launch type and add an Auto Scaling Group.&lt;/p&gt;

&lt;p&gt;The principle: write business logic that's portable across runtimes. Choose the runtime based on current requirements, not because the code is locked into it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Backend: FastAPI Without the ORM
&lt;/h2&gt;

&lt;p&gt;The backend uses FastAPI with direct psycopg2 queries instead of an ORM. This keeps the business logic focused and portable. In the future I will move to using an ORM but for now I just wanted to keep it simple.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Direct psycopg2 with parameterized queries
&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    INSERT INTO orders (id, customer_name, customer_email, items, total_amount)
    VALUES (%s::UUID, %s, %s, %s::JSONB, %s)
    RETURNING *
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code is runtime-agnostic. It works in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Fargate container (current deployment)&lt;/li&gt;
&lt;li&gt;A Lambda function using container images&lt;/li&gt;
&lt;li&gt;An EC2-based ECS service&lt;/li&gt;
&lt;li&gt;A Kubernetes pod in EKS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The application doesn't use Fargate-specific features or Lambda-specific event handlers. The container listens on a port and handles HTTP requests. Where it runs is an infrastructure decision, not a code decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Security Layer Cake
&lt;/h2&gt;

&lt;p&gt;With great power comes great responsibility. An e-commerce platform needs good security, so I implemented some basic best practices but it’s just a start with much more to be done. As I evolve this project I will add more. For now we’re using Pydantic input validation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Pydantic Validation with Custom Validators
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class OrderCreate(BaseModel):
    customer_name: str = Field(..., min_length=2, max_length=100)
    customer_email: EmailStr  # Pydantic's built-in email validation
    items: List[OrderItemCreate]

    @validator('customer_email')
    def validate_email_not_disposable(cls, v):
        disposable_domains = ['tempmail.com', 'throwaway.email', '10minutemail.com']
        domain = v.split('@')[1].lower()
        if domain in disposable_domains:
            raise ValueError('Disposable email addresses are not allowed')
        return v

    @validator('customer_name')
    def validate_name(cls, v):
        if not re.match(r"^[a-zA-Z\s\-']+$", v):
            raise ValueError('Name contains invalid characters')
        return v
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 2: Client-Side Validation
&lt;/h3&gt;

&lt;p&gt;The frontend validates inputs before submission, providing immediate user feedback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Name validation - letters, spaces, hyphens, apostrophes only
if (!/^[a-zA-Z\s\-']+$/.test(customerData.name)) {
  errors.name = 'Name can only contain letters, spaces, hyphens, and apostrophes';
}

// Email validation with TLD requirement
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
if (!emailRegex.test(customerData.email)) {
  errors.email = 'Please enter a valid email address';
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 3: Request Middleware
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@app.middleware("http")
async def validate_request(request: Request, call_next):
    suspicious_patterns = [
        '../',           # Path traversal
        '&amp;lt;script',       # XSS attempts
        'DROP TABLE',    # SQL injection
        '\x00',          # Null byte injection
    ]

    path = str(request.url)
    for pattern in suspicious_patterns:
        if pattern.lower() in path.lower():
            return JSONResponse(status_code=400,
                               content={"detail": "Invalid request"})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 4: Parameterized Queries
&lt;/h3&gt;

&lt;p&gt;All SQL queries use parameterization to prevent SQL injection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Never do this (even with validation)
query = f"INSERT INTO orders VALUES ('{order_id}', '{name}'...)"

# Always do this
cursor.execute(
    "INSERT INTO orders VALUES (%s::UUID, %s, %s, %s, %s)",
    (order_id, name, email, items_json, total)
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Infrastructure as Code (With Terraform)
&lt;/h2&gt;

&lt;p&gt;I am a very big proponent of using Infrastructure as Code (IaC) and my go-to tool for this is Terraform. Setting up all your resources in Terraform stacks makes it super easy to setup (and tear down) everything wherever you need. AWS services have very good Terraform support and even things like setting up multi-region DSQL clusters can be done via Terraform. Here I’m using the official terraform-aws-modules/rds-aurora DSQL module. The entire infrastructure is defined in Terraform, making it reproducible and version able.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;module "dsql_primary" {
  source  = "terraform-aws-modules/rds-aurora/aws//modules/dsql"
  version = "~&amp;gt; 9.0"

  deletion_protection_enabled = false
  witness_region              = "us-west-2"
  create_cluster_peering      = true
  clusters                    = [module.dsql_secondary.arn]

  tags = {
    Name        = "${var.project_name}-dsql-primary"
    Environment = var.environment
  }
}

module "dsql_secondary" {
  source  = "terraform-aws-modules/rds-aurora/aws//modules/dsql"
  version = "~&amp;gt; 9.0"

  providers = {
    aws = aws.secondary  # us-east-2
  }

  deletion_protection_enabled = false
  witness_region              = "us-west-2"
  create_cluster_peering      = true
  clusters                    = [module.dsql_primary.arn]

  tags = {
    Name        = "${var.project_name}-dsql-secondary"
    Environment = var.environment
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dsql module handles cluster peering automatically, creating a multi-region DSQL setup with strong consistency across regions. One terraform apply creates multi-region DSQL clusters (primary in us-east-1, secondary in us-east-2, witness in us-west-2).&lt;/p&gt;

&lt;p&gt;My current store implementation involves setting up a VPC, Subnets, and all the other infrastructure needed to run the Elastic Container Service. The app stack is not running in multiple AWS regions though so it doesn’t really take advantage of the DSQL database being multi-region. In future versions I will implement true multi-region support for everything with duplicate application stacks in each region and Route53 failover routing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqqo2mqfiupppxguss5q8.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqqo2mqfiupppxguss5q8.jpg" alt="Terraform Apply Output" width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Region DSQL Configuration (Not Really Utilized Yet)
&lt;/h2&gt;

&lt;p&gt;The infrastructure creates DSQL clusters in multiple US regions (us-east-1 primary, us-east-2 secondary) with us-west-2 configured as the witness region using the official Terraform module. This provides data replication and disaster recovery capabilities within the US. Note that the witness region is just a configuration setting for maintaining quorum - there's no actual DSQL cluster in us-west-2, only in us-east-1 and us-east-2. However, the current application always connects to the primary cluster in us-east-1, regardless of where the user is located.&lt;/p&gt;

&lt;p&gt;Aurora DSQL's multi-region setup is conceptually similar to DynamoDB Global Tables - both replicate data across multiple AWS regions with strong consistency and automatic failover. The key difference: DSQL gives you SQL with PostgreSQL compatibility, while Global Tables use DynamoDB's NoSQL model.&lt;/p&gt;

&lt;p&gt;Important limitation: DSQL multi-region clusters are currently restricted to geographic groupings. You can link clusters within the US (us-east-1, us-east-2, us-west-2), within Europe (eu-west-1, eu-west-2, eu-west-3), or within Asia Pacific (ap-northeast-1, ap-northeast-2, ap-northeast-3), but not across continents. For true global data synchronization across continents, DynamoDB Global Tables remains the better choice.&lt;/p&gt;

&lt;p&gt;Aurora DSQL's multi-region feature shines when you have a multi-region application within the same geographic area that can route users to their nearest cluster. In that scenario, East Coast US users could connect to us-east-1 while West Coast users connect to us-west-2, both accessing the same strongly consistent data with lower latency. The witness region maintains quorum for strong consistency.&lt;/p&gt;

&lt;p&gt;For this initial demo application with a single-region deployment (all ECS tasks in us-east-1), the multi-region clusters provide excellent data protection and fast disaster recovery within the US, but we're not leveraging the performance benefits of local reads. A future version could deploy the application stack in multiple US regions with Route53 routing users to their nearest endpoint, fully utilizing DSQL's regional multi-region capabilities.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8e49wgf83hedv2fbdpod.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8e49wgf83hedv2fbdpod.jpg" alt="DSQL Regional cluster" width="800" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;The Kabob Store is just the beginning. Here's what's on the roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Authentication: Adding AWS Cognito for user accounts and login&lt;/li&gt;
&lt;li&gt;Observability: I plan to implement full OpenTelemetry observability for the platform.&lt;/li&gt;
&lt;li&gt;Store Dashboard: Real-time order management interface for store staff with Server-Sent Events or Websockets for instant order notifications&lt;/li&gt;
&lt;li&gt;Payments: Integrating Stripe for actual transactions&lt;/li&gt;
&lt;li&gt;AI Ordering Agent: Conversational ordering interface using Amazon Bedrock AgentCore and Strands framework&lt;/li&gt;
&lt;li&gt;Analytics: Building a QuickSight dashboard for business metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;If you’re going to setup the Kabob store demo code for yourself you will need the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS Account with admin permissions&lt;/li&gt;
&lt;li&gt;Terraform &amp;gt;= 1.5.0&lt;/li&gt;
&lt;li&gt;Docker for container builds&lt;/li&gt;
&lt;li&gt;AWS CLI configured&lt;/li&gt;
&lt;li&gt;~$2-3/day budget for testing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The entire project is open source. You can deploy your own Kabob Store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Clone the repo
git clone https://github.com/RDarrylR/kabob-store

# Deploy infrastructure
cd infrastructure
terraform init
terraform apply

# Build and push container images to ECR
# Then update ECS services to deploy
# See README.md for detailed deployment steps

# Visit your ALB URL and start ordering kabobs!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Aurora DSQL offers most of what I've been looking for: SQL with DynamoDB-like operational characteristics. It provisions in seconds, scales automatically, and bills only for usage. For the first time, I can choose SQL for a new project without accepting always-on infrastructure costs or extended provisioning times.&lt;/p&gt;

&lt;p&gt;The container-based approach provides similar flexibility. The same application code can run on Lambda, Fargate, ECS on EC2, or EKS. I can choose whichever runtime fits the current requirements and cost profile. During development, Fargate eliminates server management. At scale, ECS on EC2 reduces costs. If requirements change, the code doesn't need to.&lt;/p&gt;

&lt;p&gt;The Kabob Store demonstrates a straightforward architecture: runtime-portable business logic, parameterized SQL queries, explicit transaction boundaries, multi-layer validation, and scoped IAM permissions. The entire stack deploys with terraform apply and produces a multi-region e-commerce platform with data redundancy across US regions. When requirements change (more traffic, different cost targets, specific compliance needs), the code can move to different infrastructure without rewriting the business logic.&lt;/p&gt;

&lt;p&gt;For my projects, the decision tree has expanded. As a solution architect I always want to have as many tools to choose from. DynamoDB remains the right choice when its data model fits naturally. Lambda remains the default for event-driven workloads. But when I need SQL with serverless economics, or containers that can move between runtimes, these are now viable options. The Kabob Store proves they work in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  CLEANUP (IMPORTANT!!)
&lt;/h2&gt;

&lt;p&gt;If you do end up deploying the Kabob Store yourself please understand some of the included resources will cost you real money. For a short period of time it won’t be much but running the VPC and NAT Gateway will incur you daily charges. Please don’t forget about it.&lt;/p&gt;

&lt;p&gt;Please MAKE SURE TO DELETE the stack if you are no longer using it. Running terraform destroy can take care of this or you can delete the server in the AWS console.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try the setup in your AWS account
&lt;/h2&gt;

&lt;p&gt;You can clone the Github Repo and try this out in your own AWS account. The README.md file mentions any changes you need to make for it to work in your AWS account.&lt;/p&gt;

&lt;p&gt;Please let me know if you have any suggestions or problems trying out this example project.&lt;/p&gt;

&lt;p&gt;For more articles from me please visit my blog at Darryl's World of Cloud or find me on &lt;a href="https://bsky.app/profile/darryl-ruggles.cloud" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;, &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;, &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;, or the &lt;a href="https://builder.aws.com/community/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For tons of great serverless content and discussions please join the Believe In Serverless community we have put together at this link: &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless Community&lt;/a&gt;&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>aurora</category>
      <category>terraform</category>
      <category>aws</category>
    </item>
    <item>
      <title>My AWS Summit Toronto experience and GenAI</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Sat, 06 Sep 2025 13:31:32 +0000</pubDate>
      <link>https://forem.com/aws-builders/my-aws-summit-toronto-experience-and-genai-27o2</link>
      <guid>https://forem.com/aws-builders/my-aws-summit-toronto-experience-and-genai-27o2</guid>
      <description>&lt;p&gt;I have been experimenting and working with AWS for many years now. Way back when, I started out by purchasing a training course put out by Yan Cui (featuring his cat) and have learned so much since then. I was fascinated with the idea of the scale and flexibility possible using cloud services and the things you could do with serverless services like AWS Lambda and S3.&lt;/p&gt;

&lt;p&gt;As my journey has progressed over the years I have interacted with so many amazing people and learned so much from them. I have joined a number of communities and regularly participate in online sessions, webinars, and other events to learn whatever I can. I have spent the most time in the last couple of years hanging out at the &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless Community&lt;/a&gt;. It is a great place to discuss and share ideas on serverless approaches in the cloud and there are tons of great people hanging out there.&lt;/p&gt;

&lt;p&gt;One thing I haven’t been able to do though is meet most of the people in the AWS (and related) communities in person. This past week I was able to attend the AWS Summit in Toronto, Canada and had a great experience meeting so many people and learning new things. The experience has really energized me to do more in the future. I have attended similar conferences in the past like Java One but am very glad to have had this experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AWS Summit Toronto
&lt;/h2&gt;

&lt;p&gt;Toronto is about 4 hours drive from Ottawa where I live but it’s the closest AWS Summit location to me. I will be attending AWS Re:Invent for the first time this year but when I saw the opportunity to take part in something earlier I decided it was a no brainer. I’m glad I did as I met many local people from the AWS Community and many of them may not be able to be in Las Vegas in December as well.&lt;/p&gt;

&lt;p&gt;I have been super busy with a project at work the last couple of weeks and it was a great time to be able to head out and spend some time not thinking about my work projects. I was able to stay in Toronto the day before and after the summit and took these days to relax, do some workouts, and start to feel normal again.&lt;/p&gt;

&lt;p&gt;I ended up picking up my badge for the summit the day before to save time on summit day as I had a full plan of talks to attend. I have never been in the Metro Toronto Convention Centre before and kind of got lost finding my way to the badge pickup location. I did eventually find it and it all felt a lot more real seeing the huge setup that was there for the AWS Partners Summit. I didn’t attend that this year but maybe next time as our company is part of the program.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F93gpz6g2o3t2csp3t4ds.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F93gpz6g2o3t2csp3t4ds.jpeg" alt="AWS Summit Toronto Badge" width="800" height="1417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I didn’t sleep super well on the night before the summit as I didn’t want to be late and miss anything. I had a much better journey on summit day and used the pathways above the rail tracks in the downtown area to go to directly to the Convention Centre. I was inside and waiting about 45 minutes before the first session I wanted to attend.&lt;/p&gt;

&lt;p&gt;As I was researching the session catalog in the days before the summit it was clear there would be a real focus on GenAI topics. This is really no surprise as we’ve seen this from AWS (and the tech world in general) for a long time now. Many people don’t like this but it’s the reality of where we are at today and the sheer amount of interest (and promise) in this space. I was able to pick out 7 sessions I wanted to attend that didn’t overlap. Three of these were GenAI related, one was the keynote, one was related to my favourite subject of serverless computing on AWS and there were two others. I think I had a good mix of what interests me today.&lt;/p&gt;

&lt;h2&gt;
  
  
  It’s all about community and personal interactions
&lt;/h2&gt;

&lt;p&gt;The real highlights for me from the AWS Summit Toronto were meeting and seeing presentations from AWS people who I have learned so much from over the years and meeting so many AWS community members.&lt;/p&gt;

&lt;p&gt;Tracy Wang from AWS organized a mixer at the end of the summit day at a nearby hotel where I got to finally meet so many people from the local community. Many of these people live rather close to me but I have not met them in person. It’s just not the same meeting someone on Zoom or other online video platforms as seeing them in person. Meeting AWS Heroes like Cyril Bandolo and Andrew Brown was a real honor. I have spoken with them online before but in person is so much better. I also met so many other really interesting people that seem very passionate about AWS. I will probably miss mentioning some of them but meeting people like Cyril, Andrew, Rola Dali, Bansi Delwadia, Hubert Chan, Farah Abdirahman, Tracy Wang, Aaron Brighton, Varun Kumar Manik and more was a real honour and inspires me.&lt;/p&gt;

&lt;p&gt;For my work with Ciena, we have many colleagues in locations around the world including in many cities in the US as well as our offices in Gurugram (Delhi), India. I have had the pleasure to travel to India a number of times to see our offices there and meet almost all the people I have worked with. These experiences have greatly enhanced my interactions with all these people on all the other times when we’re on opposite sides of the world working together on projects. Once you see people in person and see their mannerisms and how they respond in person it is so much easier.&lt;/p&gt;

&lt;p&gt;Even in cases with people who I work with based at our offices in Ottawa, Canada so much has changed in the last number of years. I have been at this for many years and in the earlier part of my career I learned so much from the more experienced people in my company by just hanging out at the office and in the coffee rooms. I would not be the architect, developer and overall person I am now without these experiences. It has to be difficult for recent people entering our field. Real human interactions like meeting people in person is really needed to build good relationships IMO. People from the community coming together at events like the Toronto Summit is very important and we really need to try and do more of this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmb03tp1tpt39txbxvgcc.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmb03tp1tpt39txbxvgcc.jpeg" alt="The Community Mixer" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Highlights from the sessions I attended
&lt;/h2&gt;

&lt;p&gt;AWS is really doing an all-out effort to gain traction in the industry in the Generative AI space. Just a few years ago I was working on a more traditional AI/ML team in an MLOps role using AWS and Kubernetes. Back then Natural Language Processing (NLPs) were just one subfield we were working on that didn’t seem to have much more promise than others. Of course that all changed thanks to the development of transformers and the ability to train huge language models. The rest is history and the world has been transformed with GenAI.&lt;/p&gt;

&lt;h3&gt;
  
  
  DAT302 (Building cost-effective RAG-based apps with Amazon Aurora)
&lt;/h3&gt;

&lt;p&gt;Many companies are starting out with GenAI by using Retrieval Augmented Generation (RAG) approaches where their specific data. Using RAG with generic Large Language Models makes it possible for companies to get real wins with GenAI approaches for minimal costs and effort. Using RAG along with Embedding Models, Vector databases and more can still really add up in costs if not optimized. The session DAT302 (Building cost-effective RAG-based apps with Amazon Aurora) presented by Shayon Sanyal and Aditya Samant included a number of good tips on optimizing spend with these approaches.&lt;/p&gt;

&lt;p&gt;Some ideas included using multiple levels of models to reduce costs like first passing raw data to a smaller and cheaper model to summarize and then only passing this smaller set to the more expensive and powerful models. There were also specific settings with Aurora databases that work best with RAG.&lt;/p&gt;

&lt;h3&gt;
  
  
  SVS201-R (Building Production-ready Agentic AI Architectures with AWS Serverless)
&lt;/h3&gt;

&lt;p&gt;This is honestly one of the two sessions I was most looking forward to. Anton Aleksandrov and Heeki Park are excellent presenters and I have watched and read so much of their content over the years. We had a great discussion about the approaches you can use for building Agents using tools like the Amazon Bedrock, the Strands SDK, AWS Lambda, the Elastic Container Service (ECS), Step Functions, and of course the new Bedrock AgentCore suite of tools.&lt;/p&gt;

&lt;p&gt;Anton and Heeki discussed how agents communicate with each other, require memory to work together in an Agentic fashion and how Agents are not really all that different than micro-services. The audience got lots of high level ideas on how best to approach agents and all the tooling AWS offers to help. Seeing these guys in person and chatting with Anton were highlights for me.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fka4xh81uvs506hkbjfo6.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fka4xh81uvs506hkbjfo6.jpeg" alt="Serverless Agents with Anton and Heeki" width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Keynote with Andrew Kent Warfield
&lt;/h2&gt;

&lt;p&gt;I know some people find keynotes at conferences like this boring but I was glad to able to be in the room for this one. It was great to hear how someone who grew up in Canada has become a key part of AWS. As with most of the rest of the conference there was a heavy focus in the Keynote on the various GenAI products AWS has been working on but there were also some nice stories about the role AWS has played with many Canadian companies and driving innovation here.&lt;/p&gt;

&lt;h3&gt;
  
  
  SEC302 (Building a Secure EKS Framework For Financial Services)
&lt;/h3&gt;

&lt;p&gt;I have been setting up and administrating Kubernetes clusters and using the Elastic Kubernetes Service (EKS) on AWS for a number of years now and we run multiple clusters in production. I wanted to go to this session to understand what the recommended approaches are to make sure everything is secure. Raj Bagwe and Preetam Rebello did a great job walking through many best practices to ensure security in your installs.&lt;/p&gt;

&lt;p&gt;Items like using private connectivity, enabling control plane logs, using encryption for everything possible, setting up clusters using the least privileges possible, using EKS Pod Identities, Network Policies, OPA, and other tools, looking at using Service Meshes like Istio, and so many other great tips were presented. In my case it was great to see that we are already in really good shape as we pretty much used all the approaches discussed. Once the slides are available I will go over everything in detail to make sure and I did provide lots of comments and feedback during this session.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyp4r6xmm4zvxdtyswmfc.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyp4r6xmm4zvxdtyswmfc.jpeg" alt="EKS Security Tips" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ANT401 (Architectural Patterns for Near Real-time analytics on AWS)
&lt;/h3&gt;

&lt;p&gt;In the GenAI and telecom equipment space where my work is focused we have a number of use cases for ingesting large amounts of network data and the using it to build applications. I have worked on multiple project with this kind of approach and have used Kinesis streams as well as custom solutions.&lt;/p&gt;

&lt;p&gt;Mazrim Mehrtens and Vinodh Sadayamuthu discussed some best practices they recommend to AWS customers including the use of Kinesis or Amazon Managed Streaming for Apache Kafka (MSK) for ingestion and then tools like the Managed Service for Apache Flink for super performant data processing. We have looked at using Flink in the past but based on this presentation I plan to take another look. There were some great tips and I am really looking forward to looking at the slide and the example github repos that were discussed.&lt;/p&gt;

&lt;p&gt;In the GenAI space one real use case I see here is building models that can predict or detect anomalies by doing inferences with real time data on GenAI models and also the constant retraining and updating of these models. Being able to use some of the techniques discussed by Mazrim and Vinodh could really help.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7w8tiyl1bv6dnau8vb7c.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7w8tiyl1bv6dnau8vb7c.jpeg" alt="Streaming tips" width="800" height="604"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  AIM301 (Deploy, secure, and observe Production-grade AI Agents with AgentCore)
&lt;/h3&gt;

&lt;p&gt;This is the other of the two sessions I was most looking forward to. The announcement of the preview for AgentCore at the AWS Summit New York was a super interesting development for me. There are so many tools and libraries out there for working with GenAI but there really are a smaller number that seem ready to be able to help deploy real Agentic AI apps at scale with proper security and reliability.&lt;/p&gt;

&lt;p&gt;Fei Yuan and Satveer Khurpa did a great job describing some key components of AgentCore in detail. Fei especially did a great job white-boarding some of the key components and flow of the AgentCore Identity piece. In order for Agentic AI apps to be production ready I believe security has to be front and centre. I did find the details of how the Identity module works to be rather complicated but got a really good idea of how it works from this session. I also found out they have an AgentCore Discord server so I joined that after the session and am really looking forward to interacting with Fei and other people working on AgentCore.&lt;/p&gt;

&lt;p&gt;On the way back from Toronto on the train I was watching more videos on AgentCore and am planning on my next project migrating some existing code to run with AgentCore. I really think this set of tools (along with the Strands SDK) will play a big part for the future of companies deploying Agentic Apps in production.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F432g8l3zwz227hjymh14.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F432g8l3zwz227hjymh14.jpeg" alt="Agentcore in production" width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  SVS203 (Unlock Productivity: Tools and Techniques for Serverless Developers)
&lt;/h3&gt;

&lt;p&gt;At heart I am a serverless-first person and you can't think about serverless on AWS without people like Julian Wood. I was not able to attend the full session as it overlapped with the AgentCore one but I was lucky enough to hear Julian Wood go over some great improvements in the Serverless space on AWS. Chris McPeek and Janak Agarwal also were part of this session and demonstrated many of the other super useful features AWS has been putting out in the serverless space.&lt;/p&gt;

&lt;p&gt;At the end of the session, Julian showed a great demo of using Q to build a serverless app for him including SAM templates, running locally via SAM and also building a front-end for an app in seconds. It is truly amazing (and somewhat scary) what GenAI tools like Q can do and they are only going to get better.&lt;/p&gt;

&lt;p&gt;A highlight for me was getting to chat with Julian for a few minutes after the session. I have attended so many streams where he presents all the latest in serverless and has all the key people from AWS with him.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fve8hd78u3110wpmbf0aa.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fve8hd78u3110wpmbf0aa.jpeg" alt="Julian Wood on Serverless" width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tips for others on attending summits and conferences
&lt;/h2&gt;

&lt;p&gt;Some tips for people attending this summit (which likely applies to others as well)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you’re travelling, book your (refundable) hotel well in advance. I’m kicking myself as I had a number of decent priced options back in June when I started planning but didn’t book and got stuck with a rather expensive (and non-refundable) booking maybe due to the Toronto International Film Festival being at the same time.&lt;/li&gt;
&lt;li&gt;Scout out the summit location ahead of time if possible. I’m glad I did this the day before on badge pickup day and knew much better where to go on summit day.&lt;/li&gt;
&lt;li&gt;Bring your own refreshments if you can. There were some provided as part of the summit but others were available at a rather high cost.&lt;/li&gt;
&lt;li&gt;Plan out which sessions you want to see and make sure to show up at the room 30 minutes before as all the ones I attended were full by the time the door opened to let us in.&lt;/li&gt;
&lt;li&gt;Ask questions during the sessions, share your insights, and introduce yourself to the presenters after if there is time and opportunity. I really got a lot out of personally saying hi to so many people that I have attended online sessions before and read content from.&lt;/li&gt;
&lt;li&gt;Attend the keynote! It may sound boring but keynotes at conferences like this are always great to see and you get a much better sense of the overall purpose and direction of the company and domain.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;When I first thought about attending the AWS Summit Toronto I wasn't really sure if it was would be worth it. Since the summit itself was free I wasn't expecting too much compared to something like Re:Invent but I was wrong on that front. I gained so much in knowledge and interactions on this trip and hope to do this again in the future. Now I'm looking even more forward to Re:Invent where it will be all of what I experienced but times 3 or 4. I'm also invigorated to do more and learn more.&lt;/p&gt;

&lt;p&gt;For more articles from me please visit my blog at &lt;a href="https://darryl-ruggles.cloud/" rel="noopener noreferrer"&gt;Darryl's World of Cloud&lt;/a&gt; or find me on &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;, &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;, or the &lt;a href="https://community.aws/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For tons of great serverless content and discussions please join the Believe In Serverless community we have put together at this link: &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless Community&lt;/a&gt;&lt;/p&gt;

</description>
      <category>community</category>
      <category>aws</category>
      <category>genai</category>
      <category>agentcore</category>
    </item>
  </channel>
</rss>
