Forem: Darryl Ruggles

Live Canary Deployments with AWS SAM, the New WebSocket API Resource, and Lambda Durable Functions

Darryl Ruggles — Sat, 16 May 2026 16:40:31 +0000

It's great to see new serverless offerings all the time. Two recent serverless features pair unusually well: Lambda Durable Functions and SAM's new first-class WebSocket API resource. On May 5, 2026, SAM gained AWS::Serverless::WebSocketApi, a resource type that turns the verbose dance of ApiGatewayV2::Api + Stage + Route + Integration + Lambda::Permission into a few lines of YAML.

Lambda Durable Functions also reached an important maturity point. They went GA in December 2025, have been supported in SAM CLI since v1.150.0, and now have an AWS best-practices post. They let a single Lambda handler run a multi-step workflow that can pause for hours (or days) waiting on a human callback without burning compute and costing you money. I covered the durable mechanics in detail in AWS Lambda Durable Functions: Build a Loan Approval Workflow; this post focuses on what changes when you pair them with a real-time UI.

A quick disambiguation up front: "canary deployments with SAM" has traditionally meant SAM's DeploymentPreference feature - CodeDeploy traffic-shifting on a Lambda alias, with built-in Canary10Percent5Minutes and Linear10PercentEvery1Minute types. This post isn't about that. It's about a custom canary pipeline built on Durable Functions, where the operator watches live baseline-vs-canary metrics and intervenes in real time. The two patterns are complementary - you could (and arguably should) use DeploymentPreference on the orchestrator itself once you ship this to production.

These features are made for each other. WebSocket APIs without a long-running backend tend to collapse into chat-app demos. Durable Functions without a real-time UI mean operators stare at CloudWatch waiting for a workflow to finish. Put them together and you get a workflow the user can both observe and steer in real time, with checkpoint-and-replay handling failures behind the scenes.

I built the most universally relatable version of that pairing I could think of: a canary deployment pipeline. Submit a build artifact, watch it move through smoke tests, staging deploy, integration tests, and a canary rollout, then watch live baseline-vs-canary metrics during a configurable observation window. Promote to 100%, roll back, or extend the window - and the bidirectional WebSocket means the "roll back NOW" button is always one click away while the canary is still serving traffic. Every developer reading this has lived this exact moment. The orchestrator runs Python 3.14 on arm64 with Powertools for AWS Lambda, IAM is scoped with SAM policy templates where they fit and targeted inline policies where the templates are too broad, and the whole stack deploys with a single sam deploy.

The full source - SAM template, Python handlers, React+Vite frontend, Makefile, and architecture diagram - is on GitHub: live-canary-deploys-with-sam-the-new-websocket-api-and-durable-functions.

Architecture

The flow is:

Browser submits the build artifact to POST /deploy on the HTTP API. The Start Deploy Lambda persists deployment metadata to DynamoDB and asynchronously kicks off the durable orchestrator, returning the deploymentId synchronously.
Browser opens a WebSocket to wss://...execute-api..., fires the $connect route, and sends a subscribe frame naming the deploymentId.
The orchestrator runs durable steps: smoke -> staging deploy -> integration tests -> canary deploy. Each step is checkpointed; failures replay from the last checkpoint, not from scratch. Each stage streams its log lines over WebSocket so the pipeline tracker in the UI animates as the workflow progresses.
Observation window: the orchestrator pauses on a durable wait_for_callback and asynchronously invokes a separate MetricsEmitter Lambda. The emitter streams synthetic baseline-vs-canary metrics (error rate, p50, p99, RPS) every 2 seconds via the Progress Publisher. The browser renders a side-by-side metrics dashboard with sparklines, threshold deltas, and a countdown timer.
The operator decides - Promote / Roll back / Extend - by clicking a button that sends an intervene WebSocket frame. The Intervene Lambda completes the durable callback via lambda:SendDurableExecutionCallbackSuccess and the orchestrator resumes from exactly where it stopped. If the operator does nothing, the metrics emitter completes the callback at end-of-window with an auto decision based on the configured thresholds.
Promote or rollback: the orchestrator runs the corresponding step (each emitting its own log lines to the UI), writes the final state to DynamoDB, and emits a completed event so the UI shows the final result.

A single DynamoDB table (PK = DEPLOY#<id>, SK = META | CONN#<connectionId> plus a connectionId GSI) holds both the deployment state and the WebSocket subscriptions. To keep it simple, CloudFront and S3 hosting are deliberately absent: Vite serves the frontend on localhost:5173, the browser calls the AWS-hosted endpoints directly, and there's one fewer layer of infrastructure to learn.

The pipeline stages are intentionally simulated. Each step function (step_smoke_tests, step_deploy_staging, etc.) sleeps briefly and emits realistic-looking log lines. In a real deployment you'd replace each body with calls to CodeDeploy / ECS service updates / Lambda alias shifting / your CI provider's API. The shape stays identical; only the leaves change.

Here is a shot of the sample app i built.

SAM in 30 seconds

If you've never used SAM before, the elevator pitch is: it's a CloudFormation macro, not a separate tool. Add Transform: AWS::Serverless-2016-10-31 to a CloudFormation template and you can write resources like AWS::Serverless::Function that expand at deploy time into full CloudFormation - a Lambda function plus its IAM execution role, log group, event source mappings, version, alias, and so on. Anything you can write in vanilla CloudFormation still works. I am a big fan of using Terraform for IaC but SAM is my second favorite - especially when mostly dealing with AWS serverless resources.

The SAM CLI layers on developer tooling that vanilla CloudFormation lacks: local invoke, the sam sync --watch inner loop, log tailing, sample event generation, and pipeline scaffolding.

Transform: AWS::Serverless-2016-10-31

Globals:
  Function:
    Runtime: python3.14
    Architectures: [arm64]
    Tracing: Active
    LoggingConfig:
      LogFormat: JSON
      ApplicationLogLevel: INFO
      SystemLogLevel: WARN
    Environment:
      Variables:
        POWERTOOLS_SERVICE_NAME: deploy-pipeline
        TABLE_NAME: !Ref DeploymentTable

The Globals: block is one of SAM's nicest small features: every AWS::Serverless::Function in the template inherits these defaults. No copy-paste of Runtime: python3.14 across nine handlers. New runtimes (Python 3.14 was added in November 2025), structured JSON logging, table names - changing the global block changes them all.

For deeper SAM background and the broader IaC landscape, the AWS SAM developer guide is a good starting point. The Powertools setup used throughout this project - logger, tracer, metrics decorators in the right order, structured JSON logging, EMF metrics - is the same pattern I covered in Powertools for AWS Lambda Best Practices. The rest of this post focuses on what's new.

The new WebSocket API resource

Before May 2026, defining a WebSocket API in CloudFormation looked like this for the simplest possible case:

# ~70 lines, redacted for brevity
MyApi: { Type: AWS::ApiGatewayV2::Api, Properties: {...} }
ConnectRoute: { Type: AWS::ApiGatewayV2::Route, Properties: {...} }
ConnectIntegration: { Type: AWS::ApiGatewayV2::Integration, Properties: {...} }
ConnectPermission: { Type: AWS::Lambda::Permission, Properties: {...} }
DisconnectRoute: { ... }
DisconnectIntegration: { ... }
DisconnectPermission: { ... }
DefaultRoute: { ... }
DefaultIntegration: { ... }
DefaultPermission: { ... }
Stage: { Type: AWS::ApiGatewayV2::Stage, Properties: {...} }
Deployment: { Type: AWS::ApiGatewayV2::Deployment, Properties: {...} }

Every basic Lambda-backed route needed three resources: Route, Integration, and Permission. Integration URIs needed manually constructed arn:aws:apigateway:.../invocations strings. Forgetting the Lambda::Permission resource on any route was an easy mistake to make: the route would exist, the integration would exist, the connection would even succeed, and route invocations would fail at runtime in a way that looks like an integration problem rather than a missing permission.

The new AWS::Serverless::WebSocketApi collapses all of that into:

DeployWebSocketApi:
  Type: AWS::Serverless::WebSocketApi
  Properties:
    StageName: dev
    RouteSelectionExpression: "$request.body.action"
    Routes:
      "$connect":
        FunctionArn: !GetAtt ConnectFunction.Arn
      "$disconnect":
        FunctionArn: !GetAtt DisconnectFunction.Arn
      subscribe:
        FunctionArn: !GetAtt SubscribeFunction.Arn
      intervene:
        FunctionArn: !GetAtt InterveneFunction.Arn

That's the entire WebSocket API. Behind the scenes SAM still generates the same set of CloudFormation resources - one Api, one Stage, and per-route Route + Integration + Lambda::Permission - so anything you could do in raw CloudFormation you can still do here. You're trading verbosity for a smaller blast radius for typos.

A few things worth noting from the resource reference:

RouteSelectionExpression is required. The conventional value is $request.body.action, which means: parse incoming messages as JSON and dispatch on the action field. The frontend sends { "action": "subscribe", "deploymentId": "..." } and API Gateway routes it to the subscribe integration.
The route property is FunctionArn, not Function. Easy to get wrong if you're paraphrasing the resource by analogy with other SAM event sources. Mistakes here fail at deploy time, not runtime.
There is no documented !GetAtt MyApi.ApiEndpoint. !Ref MyWebSocketApi returns the API ID, and !GetAtt MyWebSocketApi.Stage returns the generated stage's logical reference. To get the wss endpoint you construct it: wss://${MyApi}.execute-api.${AWS::Region}.amazonaws.com/${StageName}.

Authorization is `$connect`-only, and `$connect` auth alone isn't enough

The SAM Auth block applies only to $connect. Valid AuthType values are NONE, AWS_IAM, and CUSTOM for a Lambda authorizer.

I left this as NONE in the demo for simplicity, but for this specific design that's a real leak vector worth naming. Any connected client can send {"action":"subscribe","deploymentId":"..."} and start receiving the live metrics and log stream for an arbitrary deployment ID. Worse, intervene is a control-plane action that promotes or rolls back a deploy.

$connect authentication proves who opened the socket. It doesn't prove that the caller is allowed to subscribe to a given deployment or send intervene for it.

Production deployments need:

A Lambda authorizer on $connect that validates the caller. Browsers can't set arbitrary headers on the WebSocket upgrade, so the conventional carriers are a query-string token or the Sec-WebSocket-Protocol subprotocol header. Both can leak into request logs, devtools, reverse-proxy logs, and support captures.
A short-lived, audience-scoped WebSocket ticket minted by your backend after the user authenticates through your normal flow, instead of a long-lived bearer token in any of those carriers. The $connect authorizer should bind the validated claims to the connectionId so route handlers can look them up later.
Origin validation in the $connect authorizer. WebSocket APIs don't enforce CORS the same way fetch does, so the Origin header is your own perimeter check. Reject anything not from your expected frontend origin list.
Per-message authorization in every route handler. Store the authenticated principal alongside each connection record, and in subscribe and intervene, check the principal against the deployment owner, team, or environment before honoring the action. Reject unauthorized subscribe, promote, rollback, or extend messages with a 403-equivalent and log it.
AccessLogSettings on the stage, with a format that doesn't log raw tokens.
DefaultRouteSettings.ThrottlingBurstLimit and DefaultRouteSettings.ThrottlingRateLimit so a misbehaving client can't burn your connection-message budget.

Why not just SSE?

The first question I asked myself was "do I actually need WebSockets?" Server-Sent Events would handle one-way progress streaming with much less infrastructure - a Lambda Function URL with RESPONSE_STREAM invoke mode, no connection store, no fan-out publisher. (Function URLs are public endpoints unless protected with IAM auth or your own application-level authorization layer; that's a separate piece of work either way.)

But this workflow needs a return channel, and a low-latency one. The operator's "roll back NOW" decision has to land while the canary is still serving traffic - if a customer-impacting regression appeared at second 35 of a 60-second observation window, you don't want to wait until the window ends and hope the auto-decision picks rollback. You want a button that triggers an immediate callback completion. SSE is one-way. You'd end up with SSE for metrics streaming and HTTP for the intervention, two different protocols, two different idle-connection lifecycles, two sets of error-handling. WebSockets give you one bidirectional channel for both.

The same logic applies to any workflow with bidirectional, low-latency interaction over long time horizons: collaborative editors, multi-agent systems where the human can re-prompt, interactive ML training where the operator can adjust hyperparameters mid-run, IoT control planes. Each of those becomes a tractable SAM application with WebSocketApi + Durable Functions; without one or the other, you're either gluing together more services or polling.

Sending messages back from Lambda

The other half of the WebSocket story is how a Lambda function pushes a frame to a connection. API Gateway exposes a small management API - POST /@connections/{id} - and boto3 has a dedicated client for it:

import boto3

# domain and stage come from event['requestContext'] inside any route handler,
# or are constructed from the WEBSOCKET_API_ID env var elsewhere.
client = boto3.client(
    "apigatewaymanagementapi",
    endpoint_url=f"https://{api_id}.execute-api.{region}.amazonaws.com/{stage}",
)
client.post_to_connection(ConnectionId=connection_id, Data=b'{"hello":"world"}')

The IAM action is execute-api:ManageConnections and the resource ARN format is arn:aws:execute-api:{region}:{account}:{api-id}/{stage}/POST/@connections/*. In the SAM template I scope this tightly:

- Version: '2012-10-17'
  Statement:
    - Effect: Allow
      Action: execute-api:ManageConnections
      Resource: !Sub 'arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${DeployWebSocketApi}/dev/POST/@connections/*'

The Lambda needs both the apigatewaymanagementapi boto3 client (which is just an HTTP client pointed at a constructed endpoint URL) and an IAM policy granting execute-api:ManageConnections against the specific WebSocket API + stage. Wildcard either piece and you'll regret it during pen-test time.

The Progress Publisher Lambda also has to handle the case where a browser closed the tab between the moment the orchestrator decided to publish and the moment the publisher actually POSTed. API Gateway returns HTTP 410 Gone; boto3 surfaces this as GoneException. The publisher catches it and prunes the dead connection from DynamoDB:

for connection_id in subscribers:
    try:
        client.post_to_connection(ConnectionId=connection_id, Data=frame)
        sent += 1
    except client.exceptions.GoneException:
        dead.append(connection_id)
if dead:
    remove_subscriptions(dead)

Lambda Durable Functions

Standard Lambda has one execution environment per request and a 15-minute hard timeout. Step Functions handles long-running orchestration but at the cost of writing your business logic in Amazon States Language and paying per state transition. Durable Functions split the difference: you write the workflow as plain Python code in a Lambda handler, and the Lambda runtime handles the checkpoint-and-replay underneath. If you want the full mental model of how checkpoints, replays, and callbacks work, my loan approval workflow post walks through it from scratch with a different example.

Each call to context.step() is a checkpoint. If the underlying Lambda crashes, the runtime kills the environment for capacity, or your replay hits the 15-minute wall-clock, the runtime simply re-invokes the handler with the same execution ID. Recorded step results are returned without re-executing the body. From the developer's perspective, the function "just resumes" - even if "resume" means the wait was 23 hours and 59 minutes.

The replay model also means the orchestration body should stay deterministic. Generate UUIDs, read the clock, call external APIs, query DynamoDB, write to a file, or emit side effects inside @durable_step functions rather than inline in the orchestration path. Otherwise a replay can compute a different value (a new UUID, a different current timestamp, a now-different DynamoDB row) and take a different branch than the original execution, which defeats the point of checkpoint-and-replay. The orchestration body should read like a recipe of step calls plus branching on their recorded results, nothing else.

In SAM, you opt in by adding a DurableConfig block to an AWS::Serverless::Function:

DeployOrchestratorFunction:
  Type: AWS::Serverless::Function
  Properties:
    Handler: handlers.deploy_orchestrator.handler
    Timeout: 900             # per-replay wall clock; the durable timeout is what matters
    AutoPublishAlias: live
    DurableConfig:
      ExecutionTimeout: 86400        # 24 hours, integer seconds (max 31622400 = 366 days)
      RetentionPeriodInDays: 30

ExecutionTimeout is the outer hard limit on how long a single durable execution can live, in plain integer seconds (not ISO 8601 - early blog posts disagreed on this). RetentionPeriodInDays controls how long the execution history is kept after completion (1-90, default 14).

Supported runtimes today are python3.13, python3.14, nodejs22.x, nodejs24.x, java17, java21, and java25, plus container images.

IAM for durable execution

Two execution-role actions are required (these go on the orchestrator's role, not the caller's):

Statement:
  - Effect: Allow
    Action:
      - lambda:CheckpointDurableExecution
      - lambda:GetDurableExecutionState
    Resource: !Sub '${DeployOrchestratorFunction.Arn}:*'

AWS publishes a managed policy (AWSLambdaBasicDurableExecutionRolePolicy) that bundles these with CloudWatch Logs basics, but you can construct it inline with two lines. The :* suffix matches all published versions, since durable execution requires a published version (not $LATEST).

For services that need to complete a callback - the Intervene Lambda and the Metrics Emitter in this project - the actions are different:

- Effect: Allow
  Action:
    - lambda:SendDurableExecutionCallbackSuccess
    - lambda:SendDurableExecutionCallbackFailure
    - lambda:SendDurableExecutionCallbackHeartbeat
  Resource: !Sub '${DeployOrchestratorFunction.Arn}:*'

The heartbeat action is the pattern that makes the metrics emitter behave nicely when the operator interrupts; more on that below.

The Python SDK

The durable SDK is a separate PyPI package, not part of the runtime or Powertools. It's installed via requirements.txt:

aws-durable-execution-sdk-python>=0.1.0

The handler signature changes slightly: context is no longer a LambdaContext but a DurableContext, and the function is decorated with @durable_execution. Steps are decorated with @durable_step. Heads-up: a few of the config classes (Duration, WaitForCallbackConfig, etc.) aren't re-exported from the top-level package - import them from aws_durable_execution_sdk_python.config:

from aws_durable_execution_sdk_python import (
    DurableContext, durable_execution, durable_step,
)
from aws_durable_execution_sdk_python.config import Duration, WaitForCallbackConfig
# WaitForCallbackContext is a runtime-protocol type used to annotate the
# submitter callable; it lives in .types, not .config.
from aws_durable_execution_sdk_python.types import WaitForCallbackContext

@durable_step
def step_smoke_tests(_ctx, deployment_id, artifact):
    # Each "stage" is a durable step. The body emits log lines through the
    # Progress Publisher and sleeps to simulate real work; in a real deploy
    # this would call CodeDeploy / ECS / your CI provider.
    publish_log(deployment_id, "smoke", f"pulling artifact {artifact[:12]}")
    publish_log(deployment_id, "smoke", "/health 200 ok")
    return {"passed": True}

@durable_execution
def handler(event, context: DurableContext):
    deployment_id = event["deploymentId"]
    context.step(step_smoke_tests(deployment_id, event["artifact"]))
    context.step(step_deploy_staging(...))
    context.step(step_integration_tests(...))
    context.step(step_deploy_canary(...))
    decision = context.wait_for_callback(register_observation, ...)
    if decision == "promote":
        context.step(step_promote(...))
    else:
        context.step(step_rollback(...))
    context.step(step_finalize(...))

Each @durable_step-decorated function takes a StepContext as its first argument; context.step(step_smoke_tests(deployment_id, artifact)) calls the wrapper to produce a curried Callable[[StepContext], T], and the durable runtime then calls that with a real StepContext. Once you see the pattern, all the parallel/map/wait helpers follow it consistently.

The killer feature: callbacks

This is the single most useful primitive Durable Functions adds to the Lambda model. We have callback with Step Functions but it's great to have these now with Lambda!context.wait_for_callback() registers an external completion point and pauses the workflow without paying for compute while waiting:

def register_observation(callback_id: str, _ctx) -> None:
    # Persist the callback ID so the Intervene Lambda can find it.
    store_callback_id(deployment_id, callback_id)
    # Kick off the metrics emitter as an async fire-and-forget invoke.
    # It will stream metrics_tick events to subscribers and complete this
    # callback at end-of-window with an auto decision (or stop early if
    # the operator beats it to the punch).
    invoke_metrics_emitter(deployment_id, callback_id, observation_seconds)
    publish(deployment_id, {"type": "stage", "stage": "observation", "status": "running"})

decision_payload = context.wait_for_callback(
    register_observation,
    name="canary-observation",
    config=WaitForCallbackConfig(
        timeout=Duration.from_seconds(observation_seconds + 120)
    ),
)

Duration is a frozen dataclass with seconds: int and a family of factory classmethods (from_seconds, from_minutes, from_hours, from_days). The keyword form Duration(seconds=N) works too because it's the underlying field, but the factory methods are the canonical pattern shown in the SDK examples and read better for non-trivial durations: Duration.from_hours(24) is unambiguous; Duration(seconds=86400) makes a reader do mental math.

The submitter's signature is (callback_id: str, ctx: WaitForCallbackContext) -> None. The runtime calls it once with a fresh callback_id, the function persists that ID and notifies the outside world (here: kicks off the metrics emitter and tells the UI), and wait_for_callback blocks until something completes the callback via the Lambda API.

In this project there are two parties racing to complete the same callback:

The operator clicks Promote / Roll back / Extend in the UI. The Intervene Lambda receives the WebSocket frame, looks up the callback ID, and calls lambda:SendDurableExecutionCallbackSuccess with the chosen decision.
The metrics emitter runs the observation window to completion (e.g. 60 seconds), evaluates the canary metrics against the configured thresholds, and calls the same API with an auto decision ("promote" if metrics stayed within thresholds, "rollback" if they didn't).

First call wins. The orchestrator's wait_for_callback returns whatever payload was passed, and execution continues from there.

Heartbeating to detect superseded callbacks

When the operator clicks Extend observation, the orchestrator gets decision = "extend", registers a new callback, and invokes a new metrics emitter for the extension window. But the original metrics emitter is still running - Lambda invocations can't be cancelled mid-flight. Without intervention, both emitters publish ticks for the same deployment ID for the next ~60 seconds, and the dashboard shows two timers and conflicting metrics.

The fix is a heartbeat at the top of every tick:

while time.time() < deadline:
    try:
        _lambda_client.send_durable_execution_callback_heartbeat(
            CallbackId=callback_id
        )
    except _lambda_client.exceptions.ClientError as exc:
        # CallbackTimeoutException covers both "no longer pending" cases:
        # the callback was completed by another party (operator clicked
        # extend/rollback/promote, or a sibling emitter), or its heartbeat
        # timeout expired. Either way, we should stop emitting.
        logger.info("callback no longer pending; stopping emitter early")
        return
    publish_tick(...)
    time.sleep(2)

The exception name is a bit misleading - CallbackTimeoutException is documented as the timeout-expiry signal, but in practice it's also what surfaces when the callback was already completed by another caller (I caught this on a SendDurableExecutionCallbackSuccess race in testing - same exception, message "The callback is either timed out or already completed"). Catching it covers both supersede paths. The result: the moment the operator clicks Extend, the next heartbeat from the old emitter raises and the old emitter exits cleanly. Only the new extension emitter continues. This is the durable equivalent of "send a cancellation token down" and it costs essentially nothing.

An image of the pipeline in action is below.

Step Functions vs Durable Functions

Worth a quick comparison since teams will reach for both. Step Functions is mature, has a visual editor, supports JSONata for transforms (added at re:Invent 2024; AWS now recommends it for transformation-heavy workflows, though JSONPath remains supported and widely used), and is the right answer for state machines that span Lambda + ECS + Bedrock + SQS + SNS with no application code. I built a version of that pattern in Serverless Data Processor with Step Functions, Lambda, and Fargate (Rust) - that workflow has multiple compute backends and a clean state-machine shape, which Step Functions handles well.

Durable Functions is the right answer when:

The workflow logic is code you'd want to write anyway (loops, conditionals, branching on metrics, optional extension windows that recurse into more callbacks)
You want callbacks measured in minutes-to-days without paying State Transitions costs - the canary observation window in this project is exactly that pattern
The team already lives in Python/Node.js/Java and doesn't want to maintain ASL

A canary deploy with operator intervention is a borderline case. Step Functions can absolutely express it - the wait-for-callback / Task-with-task-token pattern has been around for years. But in practice the deploy logic ends up being half ASL and half code, with the branching rules ("if errors > threshold then rollback else if extend then loop") split awkwardly between the two. Writing it as one Python function with if statements and wait_for_callback reads like the workflow you'd describe to a colleague.

What SAM CLI brings to the table

The CLI is the half of SAM that turns a CloudFormation template into a tight inner loop. Worth calling out the commands I actually use day-to-day on this project:

sam validate --lint: Schema plus cfn-lint pass over the template; runs in seconds.
sam build: Bundles each function's CodeUri with its requirements.txt into .aws-sam/build/.
sam deploy --guided: First-time interactive deploy that writes samconfig.toml.
sam deploy: Subsequent deploys; uses the parameters in samconfig.toml.
sam sync --watch: Dev inner loop; code-only changes deploy in seconds via service APIs, bypassing CloudFormation.
sam logs --tail --name MyFunc: Live tail of one function's logs.
sam local invoke MyFunc --event events/foo.json: Run a function locally in Docker or Finch.
sam local invoke MyFunc --durable-execution-name local-1: Run a function locally as a durable execution.
sam local execution history <name>: Inspect the durable runtime's view of a workflow's steps.
sam local callback succeed <id>: Manually complete a paused callback during local testing.
sam local generate-event s3 put: Emit sample event payloads; also supports services such as SQS, SNS, EventBridge, Kinesis, and Cognito.
sam pipeline init: Generate CI/CD pipeline configs for CodePipeline, GitHub Actions, GitLab, Jenkins, and Bitbucket.

For reference, samconfig.toml for this project is six lines of parameter overrides plus the standard scaffolding. The first sam deploy --guided writes this for you; subsequent sam deploy runs use it without prompting:

version = 0.1
[default.global.parameters]
stack_name = "canary-deploy"
region = "us-east-1"

[default.deploy.parameters]
profile = "blog_admin"
capabilities = "CAPABILITY_IAM"
confirm_changeset = false
fail_on_empty_changeset = false
resolve_s3 = true
parameter_overrides = "Environment=\"dev\" AllowedOrigin=\"http://localhost:5173\" ObservationSeconds=\"60\" DurableExecutionTimeoutSeconds=\"86400\""

The big one in real-world use is sam sync --watch. A standard sam deploy runs a CloudFormation changeset, which takes 30-90 seconds even when the only change is one line of Python. sam sync differentiates code vs. infrastructure changes: code changes are deployed directly via lambda:UpdateFunctionCode, taking 2-3 seconds. Infrastructure changes (a new resource, a changed property) still go through CloudFormation. The trade-off is that sam sync introduces drift between CloudFormation's view of the stack and reality - never use it on a production stack, but for local development it's transformative.

A few smaller things landed in SAM CLI v1.156.0 (March 2026) that this project uses:

.env file format support for --env-vars. You can now write KEY=value lines instead of the old JSON envelope.
Route-specific CORS on AWS::Serverless::HttpApi - useful for the demo's localhost dev origin without opening up production routes.
BuildKit support for container image builds - not used here (zip is simpler) but a meaningful win for teams packaging Lambdas as containers.
Rust cargo-lambda graduated from experimental to stable - separate story, covered in Daniel Abib's multi-threaded Rust on Lambda post.

Policy templates over Connectors

A note on IAM. SAM offers two abstractions for granting permissions: policy templates (named, scoped policies like DynamoDBCrudPolicy) and Connectors (declarative Read/Write semantics between resources). Jeremy Daly's critique of Connectors is still the most coherent position on this: the Read/Write semantics are confusing (e.g., Write on DynamoDB enables deletions, Read on SQS only receives but you need Write to delete after processing), and they don't compose well across nested or multi-stack architectures. For this project, policy templates are easier to reason about because each function's permissions are visible right where the function is defined. Note that template names like DynamoDBCrudPolicy include deletes too - "least privilege" with the CRUD template still grants more than a strict read-only function needs, so I fall back to inline policies for the few cases where the templates are too broad (the execute-api:ManageConnections grant scoped to a specific API and stage, the durable-callback grants, etc.).

This project uses policy templates where they fit, plus targeted inline policies for the permissions the templates don't cover:

Policies:
  - DynamoDBWritePolicy:
      TableName: !Ref DeploymentTable
  - LambdaInvokePolicy:
      FunctionName: !Sub '${AWS::StackName}-ProgressPublisher'
  - Version: '2012-10-17'      # inline for the cases templates don't cover
    Statement:
      - Effect: Allow
        Action: execute-api:ManageConnections
        Resource: !Sub 'arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${DeployWebSocketApi}/dev/POST/@connections/*'

Every Lambda has its own role. The Start Deploy function can write to DynamoDB and invoke the orchestrator - that's it. The Intervene function can read DynamoDB and complete durable callbacks - that's it. The Progress Publisher can read DynamoDB, prune subscriptions, and POST to WebSocket connections - that's it. The blast radius of any one function being compromised stays small.

Things to watch for

A real list of things I tripped over while building this. Most aren't obvious until you hit them.

Use a recent SAM CLI. AWS::Serverless::WebSocketApi support landed in 1.159.1 (released April 28, 2026, ahead of the public AWS announcement on May 5). I verified this template against 1.159.1. As of mid-May 2026, 1.160.0 is the current release - use that or newer unless you have a reason to pin. Older versions (the Durable Functions launch 1.150.x, the post-BuildKit 1.156.0) don't know the new resource type and sam deploy fails at changeset creation with the unhelpful Transform AWS::Serverless-2016-10-31 failed with: Internal transform failure. Upgrade with pip3 install --user --upgrade aws-sam-cli (or grab the packaged installer from the latest AWS SAM CLI GitHub release) and confirm with sam --version. Expected: SAM CLI, version 1.160.0 or newer.
StageName on WebSocketApi must be a literal string in SAM 1.159.1. Any intrinsic - !Ref Environment, !Sub "${Environment}" - trips a TypeError deep inside the SAM transform's per-route Lambda permission generator: Error transforming template: can only concatenate str (not "dict_node") to str. The traceback points at samtranslator/model/api/websocket_api_generator.py line 299 (_construct_permission); the constructor builds the permission's SourceArn by string concatenation with StageName and chokes on the dict node. The workaround in this template is hard-coding StageName: dev and matching it in the Lambda env var (WEBSOCKET_STAGE: dev) and the Outputs. Per-environment parameterization for the rest of the stack still works through the Environment parameter; only the WebSocket stage name has to be a constant. Reported in SAM 1.159.1; if you're on a later release, try the intrinsic form first and drop back to the literal only if you hit the same traceback.
AutoPublishAlias only republishes on code changes, not env-var-only changes. This bites hard with durable functions because each version freezes its environment variables. If you change a config value but don't touch any Python file, sam deploy updates $LATEST but doesn't publish a new version - the alias keeps pointing at the old version with the stale env var. Verify with aws lambda get-alias --function-name X --name live --query FunctionVersion and force-publish with aws lambda publish-version + aws lambda update-alias if the version is older than your env-var change. A practical workaround if you control the function code: change a Description field or bump a comment in the Python file at the same time as the env var change; SAM treats that as a code change and republishes.
wait_for_callback returns the Result field as a raw string, not a parsed object. Both the Intervene Lambda and the Metrics Emitter pass Result=json.dumps({...}) to SendDurableExecutionCallbackSuccess. The orchestrator's wait_for_callback returns that string verbatim. Forgetting this gives you TypeError: string indices must be integers, not 'str' the moment you try result["decision"]. The quick fix is a one-liner: if isinstance(decision_payload, str): decision_payload = json.loads(decision_payload). The more idiomatic fix is to configure a JSON SerDes on WaitForCallbackConfig (and on the matching CallbackConfig of the sender) so the SDK handles round-tripping for you; CallbackConfig and WaitForCallbackConfig both expose a serdes field. The demo uses the manual parse to keep the dependency surface minimal and the failure mode visible, but for production code the SerDes path is cleaner.
Watch for circular dependencies. Anything in Globals.Function.Environment that uses !Ref DeployWebSocketApi creates a dependency from every function (including the four route-handler functions the API references back) to the API itself. SAM rejects this. The same trap exists for any IAM policy whose Resource is !Sub '...${DeployWebSocketApi}/...' if that policy is on a function in the API's Routes map. The fix in this template is to keep WEBSOCKET_API_ID only on ProgressPublisherFunction, which isn't in the routes map. The publisher and metrics emitter also get explicit FunctionName values based on ${AWS::StackName}, so the orchestrator can reference them as strings (PUBLISHER_NAME: !Sub '${AWS::StackName}-ProgressPublisher') instead of with !Ref ProgressPublisherFunction. The ${AWS::StackName} pseudo-parameter has no resource dependency, so the loop never forms.
Depending on your cfn-lint version, cfn-lint may still complain about the new resource types. Until its schema catches up in your environment, expect E3006 on AWS::Serverless::WebSocketApi and E3002 on DurableConfig. Suppress them at the resource level with the Metadata.cfn-lint.config.ignore_checks pattern shown in template.yaml. SAM CLI 1.159.1 accepts both correctly; newer releases should as well.
WaitConfig doesn't exist in the durable SDK. The naming I'd seen referenced in early write-ups didn't match the published package. The real types are WaitForCallbackConfig (extends CallbackConfig) and Duration, both in aws_durable_execution_sdk_python.config (not re-exported from the top-level). Submitter callable is (callback_id: str, ctx) -> None, not (callback_id) -> None. Importing the wrong name throws Runtime.ImportModuleError at Lambda cold start and the orchestrator never runs. That can be silent from the user's perspective because lambda_client.invoke(InvocationType="Event") returns 202 once Lambda accepts the event for asynchronous invocation, even though the handler may fail later during cold start. Always tail the orchestrator's CloudWatch log group when wiring up a durable function for the first time.
The Powertools v3 Lambda layer for python314-arm64 isn't published yet. As of 2026-05, AWS publishes AWSLambdaPowertoolsPythonV3-python313-arm64 (latest version 30, Powertools 3.26.0) but not the python3.14 variant. This is the kind of thing that changes quickly; check again before publishing or before copying this template into a long-lived production repo. Confirm yourself with aws lambda list-layer-versions --layer-name AWSLambdaPowertoolsPythonV3-python314-arm64 --region us-east-1. Expected result today: "LayerVersions": []. If you point your template at a layer ARN that doesn't exist, sam deploy rolls back at function-creation time with lambda:GetLayerVersion AccessDenied (the layer's resource-based policy doesn't grant access because the layer doesn't exist - a misleading error). The fix here is to install Powertools via src/requirements.txt so each function bundles its own copy. Adds ~5 MB to each function package; cheap insurance until the layer ships.
Async fan-out tasks need to know when their callback has been superseded. The metrics emitter spawned at the start of the observation window keeps running until its window expires - even if the operator clicks Extend, completes the original callback, and a fresh emitter has already been launched for the extension window. Without intervention you get two emitters streaming overlapping metrics_tick events to the same UI. The cleanest fix is the heartbeat pattern shown earlier: lambda:SendDurableExecutionCallbackHeartbeat at the top of every tick raises CallbackTimeoutException the moment someone else completes the callback, and the emitter exits early.
Async invokes can fail silently before durable replay can save you. lambda_client.invoke(InvocationType="Event", ...) returns 202 once Lambda accepts the event for asynchronous invocation. That doesn't mean the target handler actually ran successfully. If the async invoke later fails because of throttling, handler errors, bad payload shape, or runtime/import problems, the orchestrator may never make visible progress and the user can see a frozen UI. The fix is an async-invoke destination (OnFailure -> SQS or EventBridge) or a Lambda DLQ on the orchestrator. In SAM, configure that with EventInvokeConfig on the function: set MaximumRetryAttempts, then add a DestinationConfig.OnFailure target such as an SQS queue ARN. Skipped in the demo for brevity; not skippable in production.
API Gateway WebSocket doesn't enforce a fixed concurrent-connection quota. The practical ceiling is shaped by the new-connections-per-second quota (500 per account per Region by default), the 2-hour maximum connection duration, and the 10-minute idle timeout. AWS's own example: 500 new connections per second sustained over the 2-hour window can support up to 3.6M concurrent connections. For most canary dashboards this isn't the first limit you hit, but reconnect storms (browser refreshes during a multi-environment deploy) and large internal audiences can still justify a quota-increase request on new-connection rate. Long-lived dashboards also need application-level ping/pong or periodic traffic to avoid the 10-minute idle disconnect.
CloudWatch alarms aren't optional for a workflow whose auto-decision is a safety net. The orchestrator's "auto-promote at end-of-window if metrics look fine" branch is only as good as your confidence that the orchestrator and metrics emitter are themselves healthy. At minimum, alarm on: Lambda Errors and Throttles on each function, lambda:CheckpointDurableExecution failures (visible as Lambda errors but worth a dedicated metric filter), API Gateway WebSocket 5xx rate, and DynamoDB throttling on the table. None of these alarms are in the template - omitted for demo simplicity, mandatory before you trust the auto-decision in anger.
DynamoDB encryption is AWS-owned by default; deploy metadata may want a CMK. The table uses default AWS-managed encryption. For a deploy-pipeline table holding build artifact IDs and operator decisions, customer-managed KMS keys give you per-environment isolation, key rotation control, and an audit trail of who/what decrypted. Trade-off: every read and write needs kms:Decrypt / kms:GenerateDataKey on the CMK, which adds a small per-request cost and a configuration surface (key policies, grants).
DynamoDB on-demand is the right default for hobby use; provisioned will be cheaper at scale. The template uses PAY_PER_REQUEST because the demo's traffic is bursty and small. A real CI pipeline running this dozens of times an hour will hit on-demand's per-write cost, roughly $1.25 per million write request units in us-east-1 as of this writing, which can be several times higher than provisioned capacity once traffic is steady. Switch to provisioned with auto-scaling once your request rate stabilizes.

Production hardening checklist

Because intervene can promote, roll back, or extend a deployment, this WebSocket is a control-plane interface, not just a UI convenience. Treat it like you'd treat a deployment API.

Demo vs production is more than a one-line caveat. Before you wire this up to anything that matters, walk this list:

[ ] Require $connect authorization (Lambda authorizer with a short-lived WebSocket ticket, not a long-lived bearer token).
[ ] Validate Origin in the $connect authorizer.
[ ] Persist authenticated principal/claims alongside each connection record.
[ ] Authorize every subscribe and intervene message against the deployment owner/team/environment, not just the connection identity.
[ ] Configure AccessLogSettings on the WebSocket stage with a format that omits raw tokens.
[ ] Configure DefaultRouteSettings.ThrottlingBurstLimit / ThrottlingRateLimit per route, and know your account-level throttling limits.
[ ] Add DynamoDB TTL on CONN#... rows to clean up stale subscriptions if $disconnect ever misfires.
[ ] Encrypt the deploy state table with a customer-managed KMS key if it holds anything you wouldn't paste in chat.
[ ] Add an async-invoke destination (SQS / EventBridge) or DLQ on the orchestrator so lambda_client.invoke(InvocationType="Event") failures aren't silent.
[ ] Add CloudWatch alarms on Lambda errors/throttles, durable-execution failures, API Gateway 5xx rate, and DynamoDB throttling.
[ ] Plan for new-connection-rate quota increase requests if you scale internal audiences.
[ ] Keep the orchestration body deterministic; push side effects into @durable_step functions.
[ ] Use AutoPublishAlias plus a DeploymentPreference on the orchestrator itself if you want canary semantics on your canary orchestrator (yes, really).

Most of these are one-liners or one-resource additions in the template. None are skippable for production.

Cost and cleanup

The stack is cheap but not free at idle. With zero traffic:

DynamoDB on-demand with PITR enabled: pennies per day for an empty-ish table
Published Lambda versions: free at rest, billed per invoke
API Gateway HTTP API and WebSocket API: free at rest, billed per request and per connection-minute respectively
CloudWatch Logs: pennies per day for the application/system log groups

Per-deployment costs:

Lambda durable executions: standard Lambda compute pricing for the active steps. The orchestrator only runs during checkpoints; the wait_for_callback pause is free.
Metrics emitter Lambda: one invocation per observation window, running for about 60 seconds and publishing roughly 30 tick messages, plus the callback lifecycle calls. Most of that time is time.sleep between ticks, so it's billed Lambda duration, but still small at 512 MB for a demo-scale observation window.
DynamoDB: a handful of writes and reads per workflow, fractions of a cent.
WebSocket API: $0.25 per million connection-minutes plus $1 per million messages. The demo uses ~30 messages per workflow and a connection of a few minutes; the connection-minute share is essentially noise.

For this demo, expect well under a cent to around a cent per completed workflow on the AWS side, dominated by Lambda execution time during the observation window. The exact number depends on Lambda memory size, duration, and region. Negligible at hobby scale, worth understanding if you wire this up to your real CI pipeline and start firing it dozens of times a day.

When you're done:

make destroy

Verify in the console that the stack is gone and no orphaned API Gateway APIs remain. The DynamoDB table is part of the stack and will be deleted with it - if you've put data you care about there, export it first.

Wrapping up

The new AWS::Serverless::WebSocketApi resource type is exactly the kind of incremental-but-meaningful improvement SAM has been shipping since 2018. It doesn't enable anything CloudFormation couldn't do before; it just removes 60+ lines of boilerplate per WebSocket API and eliminates a category of "forgot the Lambda::Permission" silent failures.

Where it gets genuinely interesting is when you pair it with Lambda Durable Functions. Real-time UIs and long-running workflows have always wanted to talk to each other; the standard pattern was either polling (clunky), Step Functions + WebSocket fan-out (works, but lots of glue), or maintaining a custom orchestrator on Fargate (overkill for most workloads). Now you can write the workflow as plain Python in a single Lambda handler, deploy it with one SAM template, and the human-in-the-loop story is two API calls and a callback.

The key decisions in this project:

WebSocket over SSE when you need a low-latency return channel, not just one-way streaming. Watching a canary's metrics live is a one-way story, but a "roll back NOW" button is bidirectional and time-sensitive.
Durable Functions over Step Functions when the workflow is naturally code, not a state machine. A canary deploy with operator intervention has branching that reads more naturally as Python if statements than as ASL.
Policy templates plus targeted inline policies instead of Connectors, so the effective permissions stay visible beside each function.
Vite locally over CloudFront/S3 when the frontend doesn't need to be reachable from outside your laptop.
Heartbeat callbacks from any background Lambda that might be racing another finisher, so the loser exits cleanly instead of double-publishing.
Decoupled metrics emitter so the orchestrator's durable replay surface stays minimal: one wait_for_callback, the emitter does the rest.

The full source - SAM template, Python handlers, React+Vite frontend, Makefile, samconfig, architecture diagram - is on GitHub: live-canary-deploys-with-sam-the-new-websocket-api-and-durable-functions. Clone it, swap in your AWS profile, run make deploy-guided && make frontend-env && make frontend-dev, and you should have the demo running locally in a few minutes once your AWS credentials and local toolchain are set up. Toggle the Inject canary error spike checkbox in the form to see the rollback path with the metrics dashboard turning red.

Resources

AWS::Serverless::WebSocketApi reference
Generated CloudFormation resources for WebSocketApi
Lambda Durable Functions launch post
Building fault-tolerant applications with Lambda Durable Functions
Best practices for Durable Functions (fraud detection example)
Test and debug durable functions with SAM
SAM CLI v1.156.0 release notes - .env support, route-specific CORS, BuildKit, Rust GA
API Gateway WebSocket connection management API
SAM policy template list
Jeremy Daly: Getting abstractions wrong with SAM Serverless Connectors
Powertools for AWS Lambda (Python) - logger, tracer, metrics, idempotency
Serverless ICYMI Q1 2026 - quarterly recap, AI-assisted serverless tooling

My related posts

AWS Lambda Durable Functions: Build a Loan Approval Workflow - my deep dive on the durable mechanics (checkpoint, replay, callbacks) with a different worked example
Powertools for AWS Lambda Best Practices - the logger/tracer/metrics pattern used throughout this project
Serverless Data Processor with Lambda, Step Functions, and Fargate (Rust) - companion piece on Step Functions orchestration, when ASL is the right tool over Durable
Lambda Managed Instances with Terraform - the rest of the Lambda compute continuum (sustained throughput, 32 GB memory, EC2 pricing)
Elastic Container Service - my default for containers on AWS - when you graduate beyond Lambda

Connect with me on X, Bluesky, LinkedIn, GitHub, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

It's All About That Memory - Using Long and Short Term Memory with Agents

Darryl Ruggles — Sun, 03 May 2026 17:57:19 +0000

Building a multi-session detective game with AgentCore Memory's 4 built-in strategies

Every AI memory demo starts the same way. "Hi, my name is Bob." Close the session, open a new one. "What's my name?" "Your name is Bob!" Confetti. Blog post done.

That's not interesting.

What if memory wasn't a feature bolted onto an agent - what if it was the entire product? What if the agent couldn't function without it? I wanted to build something where forgetting wasn't a minor inconvenience but a catastrophic failure. A detective who forgets the alibi they just disproved. A narrator who can't recall which suspects have been interviewed. A case file that resets to blank every time you close your browser.

That's the project: a noir detective mystery game called "The Blackwell Murder," built on Amazon Bedrock AgentCore, where all 4 long-term memory strategies plus short-term memory work together to make the investigation feel continuous across sessions. The detective arrives at a crime scene, interviews suspects, examines evidence, and builds a case - and when they come back the next day, the narrator picks up exactly where they left off.

The source code is on GitHub: agentcore-memory-murder-mystery

Architecture Overview

The key mental model: STM is working memory within a conversation, LTM is the case file that persists between sessions. The agent needs both.

The architecture is deliberately simple. It uses a local FastAPI proxy that sits between the React frontend and AgentCore. The example doesn't include CloudFront, Lambda, or API Gateway. The point of this project is memory rather than AWS networking. If you have AWS credentials and Terraform installed, you can clone the repo and be playing in 15 minutes.

This demo uses PUBLIC network mode with no authentication on the proxy for simplicity. Production deployments should use VPC mode with private subnets, authentication on the proxy layer, and VPC endpoints for Bedrock and AgentCore services.

Why a local proxy? The browser can't call AgentCore directly - it requires IAM SigV4 signing. The FastAPI server handles that, plus it gives us a clean place to filter out model artifacts like \ tags before they reach the UI. In production, this proxy would need authentication (Cognito, API keys, or similar) - the demo version accepts any request from localhost.

Why not the agentcore invoke CLI? The Python SDK (bedrock-agentcore) supports streaming and integrates cleanly with FastAPI's StreamingResponse. No subprocess overhead, no output parsing.

AgentCore Memory - The 4 Strategies

AgentCore Memory has two layers: short-term memory (STM) that handles turn-by-turn conversation within a session, and long-term memory (LTM) with four built-in strategies that extract, organize, and recall information across sessions.

What makes this interesting for a detective game is that every strategy maps naturally to how real investigators work. Detectives maintain fact files, write case summaries, track interrogation patterns, and adapt their approach based on what is working. The four LTM strategies do exactly this.

Short-Term Memory (STM)

STM captures the raw conversation - detective actions, narrator descriptions, tool calls and results - within a single session. The agent reads back the last few turns automatically so it knows what just happened.

When the detective says "examine the broken window" and then follows up with "check for fingerprints on the frame," STM is why the agent knows which window you're talking about without you having to repeat the context. STM events in this project expire after 30 days (configurable from 7 to 365 via the event_expiry_duration parameter).

Semantic Strategy - "CaseFiles"

Extracts and indexes factual information from conversations for retrieval by meaning, not keywords.

Namespace: /cases/{actorId}/facts/

This is the detective's fact file. Every time the agent learns something concrete - a suspect's alibi, a piece of evidence, a relationship between characters - the semantic strategy extracts it and stores it as a retrievable fact.

When the detective returns 3 sessions later and asks "what do we know about Helena's alibi?", the agent retrieves everything related to Helena: she claims she was at the Grand Hotel bar until midnight, the bartender says she left at 11:30 PM, there's a 17-minute gap, and the hotel security cameras had a convenient "glitch" during that window. No contradictions slip through. No established facts get lost.

Summary Strategy - "CaseNotes"

Creates condensed summaries of each session - the detective's case notes.

Namespace: /cases/{actorId}/{sessionId}/

At the end of each session, the summary strategy distills the conversation into a concise case update: what evidence was discovered, which suspects were interviewed, what leads are open, and where the investigation stands.

When the player starts a new session, the agent retrieves the last summary and opens with a case file briefing: "Case #1247 - The Blackwell Murder. Day 3. Last session you discovered the staged break-in and the 17-minute gap in Helena's alibi. Two leads remain open..." This is how real detectives work. They write case notes so they can pick up where they left off.

User Preferences Strategy - "DetectiveStyle"

Automatically identifies and tracks the player's investigation approach.

Namespace: /detectives/{actorId}/preferences/

This strategy watches how the player investigates and adapts the experience. If the player consistently chooses indirect questioning over confrontation, the narrator starts offering more subtle conversation options. If they prefer forensic evidence over witness interviews, crime scenes get richer physical detail.

It picks up on investigation style (methodical vs. intuitive), interrogation preference (confrontational, sympathetic, indirect), detail level (forensic deep-dives vs. big-picture summaries), and pacing preference (slow reveals vs. rapid progress).

The preference strategy is subtle. You don't notice it working until the third or fourth session, when the narrator's suggestions start feeling tailored to exactly how you like to play.

Episodic Strategy - "Interrogations"

Captures key interactions as structured episodes, then generates cross-session reflections.

Namespace: /episodes/{actorId}/{sessionId}/
Reflection namespace: /episodes/{actorId}/

This is the most compelling strategy, and the one that makes the detective game feel genuinely intelligent. Episodic memory doesn't just store what happened - it reflects on patterns across interactions.

An episode captures structured fields - the AWS docs define these as situation, intent, assessment, justification, and episode-level reflection. In practice, the output for this project looks like:

Situation: Interrogation of Helena Voss regarding her whereabouts
Intent: Catch Helena in a lie about the hotel bar timeline
Assessment: Presented the bartender's statement showing she left at 11:30, not midnight
Justification: Helena became defensive, changed story to "went for a walk," refused further questions

Reflections synthesize across episodes:

"This detective excels at catching timeline inconsistencies - present evidence contradictions early"
"Direct confrontation causes suspects to shut down - this player gets better results with patience"
"Helena's changing story pattern matches classic alibi fabrication - flag for cross-reference"

The practical result is detective intuition. The narrator can say things like "You have noticed Helena's story shifts every time you press on the timeline. Your instinct says the 17 minutes matter." The player didn't ask for that observation - the episodic reflection surfaced it automatically.

How the Strategies Map to the Investigation

Strategy	Detective Equivalent	What Gets Stored	When It Matters
STM	Working memory	Current conversation	Within a session - "which window?"
Semantic	Fact file	Suspects, alibis, evidence, relationships	Re-interviewing a suspect 3 sessions later
Summary	Case notes	Per-session investigation summary	Opening a new session - "where were we?"
User Preference	Detective instinct	Play style, interrogation approach	Narrator adapts tone and suggestions
Episodic	Interrogation log + intuition	Key interactions + cross-session reflections	"Helena's story keeps changing..."

A Note on Namespace Design

The AWS docs recommend a default namespace pattern like /strategy/{memoryStrategyId}/actor/{actorId}/session/{sessionId}/. This project uses custom descriptive namespaces instead - /cases/{actorId}/facts/, /detectives/{actorId}/preferences/, etc. - because they map directly to how a detective organizes information. When you're debugging and see a record in /cases/sloane/facts/, you immediately know what it is.

The tradeoff is that without {memoryStrategyId} in the path, multiple strategies could theoretically write to overlapping namespaces if you configure them carelessly. In practice, each strategy in this project has a distinct namespace root (/cases/, /detectives/, /episodes/), so there's no overlap. If you're building a system with many strategies, the AWS-recommended pattern with strategy IDs in the path is safer.

Where Does the Extraction Logic Live?

This is the thing that took me the longest to internalize: you don't write extraction logic. There's no code in this project that says "pull out facts for semantic memory" or "summarize this session." The platform does all of it.

When you define a strategy, you provide a type, a name, a description, and namespaces. That's it. The extraction pipeline reads your STM events - the raw conversation messages - and applies each strategy's built-in logic to decide what to extract. You never see the extraction prompt. For the built-in strategies used in this project, customization is limited to the strategy description field. AWS also offers built-in overrides (custom prompts, custom model selection) and self-managed strategies (full pipeline control) for deeper customization - see the AgentCore Memory documentation for details.

For built-in strategies, your actual levers for influencing LTM quality are indirect:

Strategy descriptions - the only direct hint you give the extraction model. "Extracts and indexes case facts for semantic retrieval" tells it to focus on facts. "Tracks detective communication style and investigation preferences" tells it to watch for behavioral patterns.
Your system prompt - shapes how the agent talks, which shapes what the extraction pipeline has to work with. A system prompt that produces atmospheric noir prose gives the summarization strategy rich material. A prompt that produces terse responses gives it less.
Your tools - return structured data that becomes part of the conversation. When examine_evidence returns forensic details about tool marks on a window frame, that structured output gives the semantic strategy concrete facts to extract. When interrogate_witness returns a suspect's shifting alibi, the episodic strategy captures it as a meaningful interaction.
The conversation itself - longer, richer conversations produce more extraction material. A single-turn "look at the window" produces less than a multi-turn investigation where the detective examines evidence, cross-references alibis, and confronts a suspect with contradictions.

The practical implication is that designing good prompts and tools is indirectly designing your memory. I didn't set out to optimize for LTM quality, but the debug watch tool showed me that conversations where the detective digs deeper - following up on inconsistencies, asking witnesses about specific details, comparing evidence across locations - produce significantly richer LTM records than surface-level interactions. The extraction pipeline rewards conversational depth.

Building the Agent

The agent runs on AgentCore via the Strands SDK. Three things matter: the system prompt, the tools, and the memory integration.

System Prompt - Noir Narrator Persona

The agent isn't the detective. It's the narrator - the voice in the dark that describes what the detective sees, hears, and feels. The system prompt establishes this firmly:

You are the narrator for "The Blackwell Murder," a noir detective mystery.
You speak in the style of classic noir fiction - rain-slicked streets, long
shadows, moral ambiguity, and the kind of truth that cuts deeper than any blade.
You are not the detective. You are the voice in the dark that describes what
the detective sees, hears, and feels.

The prompt also includes the full case briefing (locations, suspects, the solution), narrator rules, and memory integration instructions. Two rules that matter the most:

Never break character. The model must never mention tools, functions, errors, or its own reasoning. If a tool fails, the narrator says "the trail goes cold" - not "there was an error in the category specified."
Memory integration on session start. On the first message in a new session with no prior history, set the scene. On returning sessions where memory context is available, open with a case file briefing.

Custom Tools

Four tools drive the investigation. Each is a @tool-decorated function that returns narrative text and silently tracks state in a case file:

examine_evidence(item, method) - Three examination methods (visual, forensic, compare) reveal different details about the same evidence. The broken window looks suspicious on visual inspection, reveals tool marks under forensic analysis, and confirms the staged break-in on comparison.

interrogate_witness(witness, approach, topic) - Four interview approaches (neutral, sympathetic, confrontational, indirect) produce different responses from the same witness. Confrontation shuts Marcus down. Sympathy gets Clara to reveal the shadow she saw. Indirect questioning catches Marcus mentioning the service passage he claims was sealed.

search_location(location, area) - Five locations with multiple searchable areas. The study alone has the desk, window, bookcase, safe, and floor - each hiding different clues.

check_case_file(query, category) - The detective's notebook. Reviews all discovered evidence, suspect information, alibis, timeline events, and connections between suspects. Supports free-text search across all categories.

Every tool call that discovers something new pushes a notification to the frontend, which updates the Case Board and Persons of Interest panels in real time.

Memory Integration with Strands

The Strands SDK's AgentCoreMemorySessionManager handles the memory lifecycle:

config = AgentCoreMemoryConfig(
    memory_id=MEMORY_ID,
    session_id=session_id,
    actor_id=detective_id,
    retrieval_config={
        f"/cases/{detective_id}/facts/": RetrievalConfig(top_k=5),
        f"/detectives/{detective_id}/preferences/": RetrievalConfig(top_k=3),
        f"/episodes/{detective_id}/": RetrievalConfig(top_k=3),
    },
)

with AgentCoreMemorySessionManager(config, region_name=REGION) as session_manager:
    agent = Agent(
        model=model,
        system_prompt=SYSTEM_PROMPT,
        tools=[examine_evidence, interrogate_witness, check_case_file, search_location],
        session_manager=session_manager,
    )

The retrieval_config tells the session manager which LTM namespaces to query when loading context for a new request. Without it, the agent only gets STM conversation history - it wouldn't recall facts, preferences, or episode patterns from prior sessions.

The session manager does two things: on entry, it loads relevant memories (STM conversation history, LTM strategy results) into the agent's context. On exit, it persists the current conversation as new memory events. The actor_id is the detective's name, which namespaces all memory operations so multiple detectives could theoretically investigate the same case without cross-contamination.

Model Configuration

Nova Pro is the default because it has good narrative quality and is the most cost-effective option for iterative development. But the model is switchable at deploy time via the ACTIVE_LLM environment variable:

Model	Model ID	Use Case
Nova Pro	`us.amazon.nova-pro-v1:0`	Default - good balance of quality and cost
Nova 2 Lite	`us.amazon.nova-2-lite-v1:0`	1M context, optional extended thinking
Nova Lite	`us.amazon.nova-lite-v1:0`	Fastest, lowest cost
Claude Sonnet 4.6	`us.anthropic.claude-sonnet-4-6`	Best narrative quality
Claude Haiku 4.5	`us.anthropic.claude-haiku-4-5-20251001-v1:0`	Fast and affordable

The difference in narrative quality between Nova Pro and Claude Sonnet is noticeable. Claude produces more atmospheric prose and stays in character more consistently. Nova Pro occasionally breaks the fourth wall by mentioning tool names or its own reasoning process - something I had to filter out in the proxy server. For a polished demo, Claude Sonnet is the better choice. For development and iteration, Nova Pro keeps costs low. A typical 15-20 minute play session (10-15 turns, 4 tool calls per session) costs roughly $0.02-0.05 in model inference alone with Nova Pro. Claude Sonnet runs about 5-10x that. Memory operations and KMS add negligible cost on top.

Infrastructure as Code

All durable infrastructure is managed by Terraform using the AWS provider (~> 6.35). The agent itself is deployed via the agentcore CLI, which handles the zip packaging and runtime provisioning. This is a clean separation: Terraform manages what persists (Memory, IAM, KMS, S3), the CLI manages what deploys (agent code, runtime configuration).

Memory and KMS

AgentCore Memory requires a KMS key for encryption. The memory resource itself is straightforward:

resource "aws_bedrockagentcore_memory" "detective" {
  name                      = "${var.memory_name}_${var.name_suffix}"
  description               = "Persistent memory for the AI detective agent"
  event_expiry_duration     = var.event_expiry_duration_days
  encryption_key_arn        = aws_kms_key.memory.arn
  memory_execution_role_arn = var.memory_execution_role_arn
}

Note the variable name event_expiry_duration_days - the Terraform attribute is event_expiry_duration (which takes a value in days), and the variable adds the _days suffix for clarity so readers don't have to guess the unit.

The KMS key policy grants three principals access: the root account for administration (full kms:*), the AgentCore service for memory encryption operations (kms:Encrypt, kms:Decrypt, kms:GenerateDataKey, kms:DescribeKey), and the memory execution role for runtime access (same encryption actions). All policies use aws_iam_policy_document data sources - never inline JSON strings. This gives you compile-time validation and readable diffs. Note: resources = ["*"] in a KMS key policy means "this key" - it's not a wildcard across all keys.

A random_id suffix is appended to all AWS resources (S3 buckets, KMS aliases, memory names) to ensure global uniqueness. The suffix is generated once and shared across all modules.

Three Strategies via Terraform, One via CLI

Here's the real-world gotcha. The aws_bedrockagentcore_memory_strategy resource supports three of the four strategy types:

resource "aws_bedrockagentcore_memory_strategy" "case_files" {
  name        = "CaseFiles"
  memory_id   = aws_bedrockagentcore_memory.detective.id
  type        = "SEMANTIC"
  description = "Extracts and indexes case facts for semantic retrieval"
  namespaces  = ["/cases/{actorId}/facts/"]
}

resource "aws_bedrockagentcore_memory_strategy" "case_notes" {
  name        = "CaseNotes"
  memory_id   = aws_bedrockagentcore_memory.detective.id
  type        = "SUMMARIZATION"
  description = "Summarizes investigation sessions into concise case notes"
  namespaces  = ["/cases/{actorId}/{sessionId}/"]
}

resource "aws_bedrockagentcore_memory_strategy" "detective_style" {
  name        = "DetectiveStyle"
  memory_id   = aws_bedrockagentcore_memory.detective.id
  type        = "USER_PREFERENCE"
  description = "Tracks detective communication style and investigation preferences"
  namespaces  = ["/detectives/{actorId}/preferences/"]
}

The EPISODIC type isn't yet supported in the Terraform provider as of March 2026. This is tracked in terraform-provider-aws #45599. The workaround is a make target that calls the AWS CLI:

aws bedrock-agentcore-control update-memory \
  --memory-id $(MEMORY_ID) \
  --memory-strategies '{
    "addMemoryStrategies": [{
      "episodicMemoryStrategy": {
        "name": "Interrogations",
        "description": "Key interrogation episodes with cross-case reflections",
        "namespaces": ["/episodes/{actorId}/{sessionId}/"],
        "reflectionConfiguration": {
          "namespaces": ["/episodes/{actorId}/"]
        }
      }
    }]
  }'

Three things to note about the episodic strategy. First, it requires a reflectionConfiguration with its own namespace - this is where cross-session reflections are stored. Second, the reflection namespace must be at or above the episode namespace's depth - meaning reflections are less nested than episodes. In practice, this means the reflection namespace must be a prefix of the episode namespace (e.g., /episodes/{actorId}/ works as a reflection namespace for episodes stored in /episodes/{actorId}/{sessionId}/). Get this wrong and the API returns a validation error that doesn't clearly explain the constraint.

Third, because the episodic strategy lives outside Terraform, terraform destroy won't clean it up. If you destroy and recreate the infrastructure, you'll get a naming collision or an orphaned strategy. The project includes a corresponding make remove-episodic-strategy target for teardown. On the Terraform side, the memory resource's attributes don't reflect CLI-managed strategy state, so terraform plan won't show unexpected diffs after you add the episodic strategy via the CLI - no ignore_changes block is needed.

A Note on IAM Permissions

When you deploy an agent with the agentcore CLI, it auto-creates an IAM role (AmazonBedrockAgentCoreSDKRuntime-*) with a baseline policy. This policy covers what the agent needs to run - model invocation, memory read/write, and the basics. The agent works fine out of the box.

Where you will need extra IAM permissions is if you build debug tools that call the boto3 memory APIs directly - like the watch script in this project. Those tools run under your own IAM identity, not the agent's runtime role, and need explicit permissions for ListMemoryRecords, RetrieveMemoryRecords, ListEvents, and KMS decrypt on the memory encryption key. In production, create a separate narrowly-scoped IAM role for debug tools rather than granting these permissions to developer identities. Budget 15 minutes to set this up if you plan to inspect memory outside the agent.

A note on the auto-created runtime role. The agentcore CLI generates a role with broad permissions - for example, bedrock:InvokeModel with Resource: * rather than scoped to specific model ARNs. This is fine for a demo, but for production deployments, create a custom IAM role with explicitly scoped permissions. At minimum, scope bedrock:InvokeModel to the specific model ARNs your agent uses and ensure memory access policies reference only the memory resources that agent needs.

Naming Constraints

AgentCore resource names must match [a-zA-Z][a-zA-Z0-9_]{0,47} - letters, numbers, and underscores only, starting with a letter. No hyphens. This tripped me up repeatedly: case-files fails, CaseFiles works. detective-memory-abc123 fails, detective_memory_abc123 works. KMS aliases are fine with hyphens, but everything else in AgentCore isn't.

The Frontend

A noir-themed React 19 SPA with four components: the narrative log (the main detective story), the detective input, the Case Board (discovered evidence), and the Persons of Interest panel (suspect information with alibi status).

The narrative log displays the agent's noir prose as it streams in via SSE. Tool use events show as gold italic indicators - "Examining evidence...", "Interrogating witness..." - so the player knows the agent is working.

The Case Board and Persons of Interest panels update in real time as the investigation progresses. When the agent examines evidence or interviews a suspect, the tools push structured notifications through the SSE stream. New evidence items appear with an amber highlight that fades after a few seconds. Suspects show their interview count and alibi verification status (verified, contradicted, or unverified).

SSE streaming deserves a note. AgentCore returns the response as a StreamingBody - but when accessed through the invoke_agent_runtime API, the entire response arrives as a single read. The SSE events are concatenated inside it, sometimes without newline separators:

data: {"chunk": "The rain"}data: {"chunk": " hasn't stopped"}data: {"chunk": " for three days."}

The proxy server splits on data: boundaries using a regex, reassembles all chunk text, strips tags and `</tool>` XML artifacts that occasionally leak through, and re-emits clean SSE events to the browser. This filtering is model-agnostic - Nova Pro produces these artifacts most often, but any model with reasoning capabilities (including Nova 2 Lite with extended thinking enabled) can emit blocks. This two-pass approach - collect everything, clean it, then emit - is more reliable than trying to filter partial tags from individual chunks.

Demo Walkthrough

Session 1: The Crime Scene

The player opens the app. The narrator sets the scene:

The rain has not stopped for three days. You step under the police tape at 42 Ashford Lane, the collar of your coat turned up against the damp. The study door is open now - the locksmith saw to that this morning - but inside, everything is exactly as Edward Blackwell left it. Or as someone left it for him.

The player types "search the study." The agent calls search_location and describes the room - the scattered desk, the broken window, the bookcase. The Case Board lights up: "Broken Window - glass fragments outside, break pattern radiates outward."

They examine the broken window forensically. The narrator reveals the wiped frame, the tool marks. The Case Board updates: "Broken Window (forensic) - frame wiped clean, tool marks from inside."

They interview Clara Webb sympathetically. She breaks down and admits she heard arguing at 11:15 PM. The Persons of Interest panel shows Clara with her first interview note.

All of this - the evidence, the suspect information, the timeline - flows into AgentCore Memory. STM keeps the turn-by-turn conversation. The semantic strategy extracts the facts. The episodic strategy logs Clara's interrogation.

Session 2: The Case Continues

The player closes the browser, has lunch, and comes back. They start a new session with the same detective ID. The narrator opens differently now:

Case #1247 - The Blackwell Murder. Your notebook is open on the desk, the pages curling at the edges from the rain. Last time, you found the staged break-in - glass broken outward, frame wiped clean, tool marks from inside. Clara Webb heard arguing at 11:15 PM. Two voices. One was Blackwell. The other was a man she could not identify, but she said the shadow was tall, broad-shouldered. Like Marcus.

The summary strategy provided the session recap. The semantic strategy filled in the specific details. The player picks up where they left off and starts pressing on Marcus's alibi. Three sessions in, when the player consistently uses indirect questioning instead of confrontation, the narrator starts offering subtler options - the preference strategy at work.

And when the player catches Helena in another timeline inconsistency, the narrator adds: "Her story shifts every time you push on the timeline. Your instinct says the 17 minutes matter more than she is letting on." That is the episodic reflection - pattern recognition across sessions that makes the detective feel like they are building real intuition.

Observing Memory in Real Time

Understanding LTM is abstract until you watch it happen. The project includes a debug watch command that polls AgentCore Memory every 5 seconds and prints new STM events and LTM records as they appear:

make debug-memory-watch

This runs python server/debug_memory.py --watch 5, which seeds with the current state (so you only see new additions) and then streams changes. A typical session looks like this:

  Seeding current state... done (652 STM events, 156 LTM records)
  Watching for new additions...

[20:11:01] [STM] [fe400085] [user] Use a firm and aggressive approach with Clara
[20:11:11] [STM] [fe400085] [assistant] A confrontational approach with Clara Webb proves
  ineffective. She flinches at the sharp tone and retreats into monosyllables...

[20:11:59] [LTM] [USER_PREFERENCE (DetectiveStyle)]
{"context":"The user initially requested a softer approach when interrogating Clara Webb
but later explicitly requested to use a firm and aggressive approach, indicating a shift
toward more confrontational interrogation tactics with witnesses.",
"preference":"Prefers firm and aggressive interrogation approach with witnesses"}

[20:12:38] [LTM] [SUMMARIZATION (CaseNotes)]
<topic name="Witness Interview - Clara Webb (Confrontational Approach - Failed)">
Detective Sloane attempted a firm and aggressive approach with Clara Webb. The
confrontational strategy proved completely ineffective. Clara flinched at the sharp
tone and retreated into monosyllables. This failed interrogation confirms Clara's
fear is a significant barrier and indicates a gentler approach is necessary.
</topic>

STM events appear immediately as the conversation flows. LTM records follow 30-60 seconds later as the platform's extraction pipeline processes the events. You can see exactly what each strategy produces:

SEMANTIC records are plain factual statements - "Helena Blackwell was found dead in the study at 10:42 PM"
SUMMARIZATION records are topic-tagged XML with detailed session notes
USER_PREFERENCE records are structured JSON with context, preference, and categories
EPISODIC records come in two flavors: situation recaps ("situation": "A detective begins investigating...") and cross-session strategy patterns ("title": "Escalating Interrogation Pressure with Evidence Leverage")

Seeing these raw values is what made the strategies click for me. Reading the documentation, I understood that "semantic extracts facts" and "episodic captures patterns." But watching the actual output - seeing the platform independently decide that a failed interrogation was worth logging as an episode, or that a shift from soft to aggressive questioning counted as a preference change - made the system feel real. The extraction isn't just summarizing what happened. It's interpreting the conversation through each strategy's lens and producing genuinely different representations of the same events.

The watch also exposed a debugging gotcha. As of bedrock-agentcore SDK version 1.4.4, the AgentCore list_memory_records and retrieve_memory_records APIs return results under the key memoryRecordSummaries, not memoryRecords. The SDK's retrieve_memories() method handles this correctly, so the agent works fine - but if you write your own debug scripts using boto3 directly, you'll get empty results and spend hours investigating an extraction pipeline that was working all along. The watch script in this repo has the correct key. Check the latest SDK docs if you're reading this in the future - response key names can change between versions.

Other debug modes are available:

# Dump everything - strategies, STM events, and LTM records
uv run python server/debug_memory.py

# Only LTM records (skip raw conversation events)
uv run python server/debug_memory.py --ltm-only

# Only STM events
uv run python server/debug_memory.py --stm-only

# Show all sessions (default: most recent only)
uv run python server/debug_memory.py --all-sessions

What I Learned

STM vs LTM isn't either/or - they serve completely different functions. STM is working memory within a conversation. LTM is the case file that persists between sessions. You need both, and trying to use one for the other's job leads to problems. STM without LTM means the detective forgets everything between sessions. LTM without STM means the agent can't follow a multi-turn investigation within a single session.

Episodic reflections are the most compelling strategy. The semantic strategy is the workhorse - it stores facts and retrieves them reliably. But the episodic strategy's cross-session reflections are what make the agent feel genuinely intelligent. When the narrator surfaces a pattern the player didn't explicitly ask about, it creates a moment that feels like the detective is actually thinking. This is the strategy I would lead with in any demo.

Model choice matters more than I expected for character consistency. Nova Pro occasionally breaks character - mentioning tool names, exposing its reasoning process, or dropping the noir tone mid-paragraph. Claude Sonnet stays in character almost perfectly. For a narrative application where immersion matters, the model's ability to maintain a persona is as important as its raw capability. I ended up adding server-side filtering to strip \ tags and </tool> XML artifacts that leaked through from Nova Pro.

Prompt engineering is still the job - the prompt is the product. The system prompt went through more revisions than any other file in this project. The first version let the model call six tools in a single turn, drowning the player in information before they had asked a single question. Another version produced beautiful prose but kept breaking character to mention tool names. Getting the narrator to call exactly one tool per turn, stay in character when tools error, and set the scene without immediately investigating required specific, firm language - "do not chain multiple tool calls" works where "one action per turn" didn't. If you're building an agent-based application, expect to spend as much time tuning the system prompt as you do writing the code around it.

The Terraform provider gap is a real-world pattern. Three of four strategies are supported in Terraform. The fourth requires a CLI workaround. This is a common pattern with new AWS services - Terraform support lags behind the API by weeks or months. The pragmatic approach is to manage what you can in Terraform and script the rest in your Makefile, documenting the gap clearly so your future self (or your team) knows what to update when provider support arrives.

Build a memory watch tool early. The single most useful debugging aid was a script that polls memory and prints new STM events and LTM records in real time. Without it, memory's a black box - events go in, and you hope the right things come out. With it, you can see exactly what the platform extracts, how long extraction takes (30-60 seconds typically), and whether your namespace configuration is producing records where you expect them. I would build this before writing any agent code on my next project.

Going to production would add several layers. This demo runs in PUBLIC network mode with an unauthenticated local proxy. A production deployment would need: VPC mode with private subnets, VPC endpoints for Bedrock and AgentCore services (avoiding public internet for API calls), CloudFront distribution with WAF, Cognito or API key authentication on the proxy, a custom IAM role with least-privilege permissions (scoped bedrock:InvokeModel to specific model ARNs, scoped memory access to specific resources), an S3 backend for Terraform state, and Bedrock Guardrails for input validation. The architecture section of this post shows the demo setup. The production architecture is a different article.

The full source code, Terraform configurations, and Makefile workflow are available on GitHub agentcore-memory-murder-mystery. Clone the repo, run make init && make apply && make deploy-agent && make serve, and start investigating. The rain is still falling on Ashford Lane.

Connect with me on X, Bluesky, LinkedIn, GitHub, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

S3 Files: The End of Download-Process-Upload (with Terraform)

Darryl Ruggles — Fri, 01 May 2026 15:28:59 +0000

On April 7, 2026, AWS launched S3 Files - a managed NFS v4.1/4.2 layer built on Amazon EFS that provides file-system semantics on top of S3, including read-after-write consistency, advisory file locking, and POSIX permissions. (AWS Storage Gateway's File Gateway has offered NFS-over-S3 for years, but as a caching gateway appliance, not a native file system with these guarantees.) You can mount S3 Files from EC2, Lambda, EKS, and ECS (Fargate and ECS Managed Instances launch types; EC2 launch type is not yet supported). Your code reads and writes files with open(), os.rename(), and os.listdir(). No boto3 for the data path. No /tmp juggling. No copy-then-delete to simulate a rename.

In this post, I'll build two identical document-processing Lambda functions - one using the traditional S3 API approach and one using S3 Files - deploy them with Terraform, and benchmark the difference.

The Long Road to a Real S3 File System

For nearly two decades, developers have been trying to use S3 as a file system. Here's how the tools evolved:

	s3fs-fuse (2010)	Mountpoint for S3 (2023)	S3 Files (2026)
Protocol	FUSE	FUSE	NFS 4.1/4.2 (managed)
Write support	Full (but slow)	Sequential/append only	Full read/write
Rename	Copy + delete (slow)	Not supported	Instant from the NFS client's perspective (async S3 sync)
File locking	No	No	Advisory locks
Consistency	Eventual	Eventual	Read-after-write
Dir listing (1000 files)	Slow	163ms	39ms
Small file reads (1000 files)	Very slow	87.1s	4.3s
Sequential write (100MB)	~100 MB/s	I/O errors	273 MB/s
AWS managed	No (community)	Client only	Yes
Max throughput	~100 MB/s	GB/s	TB/s aggregate

Performance figures from published launch-day benchmarks; see S3 Files vs Mountpoint vs s3fs-fuse comparison and DevelopersIO GA walkthrough.

Each generation solved the previous one's biggest limitation. s3fs-fuse gave you a file system but was slow and unreliable. Mountpoint gave you speed but restricted writes to append-only - ruling out most real applications. S3 Files closes the remaining gaps: file-system semantics including advisory file locking and POSIX permissions, managed infrastructure, and strong performance for both small and large file workloads.

The "Before" Pattern: Download-Process-Upload

If you've written a Lambda function that processes files in S3, you've written this pattern:

import boto3
import os
import json

s3 = boto3.client("s3")

def lambda_handler(event, context):
    # 1. List files in the inbox
    response = s3.list_objects_v2(Bucket=BUCKET, Prefix="inbox/")

    for obj in response.get("Contents", []):
        key = obj["Key"]
        filename = key.removeprefix("inbox/")

        # 2. Download to /tmp (the only writable space Lambda gives you)
        s3.download_file(BUCKET, key, f"/tmp/{filename}")

        # 3. Process the file
        with open(f"/tmp/{filename}", "r") as f:
            content = f.read()
        result = {"word_count": len(content.split()), "lines": content.count("\n")}

        # 4. Upload processed file (S3 has no rename - copy then delete)
        s3.copy_object(
            Bucket=BUCKET,
            CopySource={"Bucket": BUCKET, "Key": key},
            Key=f"processed/{filename}",
        )

        # 5. Upload metadata
        s3.put_object(
            Bucket=BUCKET,
            Key=f"processed/{filename}.meta.json",
            Body=json.dumps(result),
        )

        # 6. Delete the original
        s3.delete_object(Bucket=BUCKET, Key=key)

        # 7. Clean up /tmp (Lambda reuses containers)
        os.remove(f"/tmp/{filename}")

Every step is an S3 API call. Every file passes through /tmp. "Renaming" a file requires a full copy followed by a delete - two API calls for something that should be instant. If your function processes 100 files, that's hundreds of API calls, each adding latency.

And /tmp itself is limited. Lambda gives you 512MB by default (up to 10GB at extra cost). If you're processing large files or many files concurrently, you'll hit that ceiling.

The "After" Pattern: Just Use the File System

With S3 Files mounted at /mnt/docs, the same logic becomes:

import os
import json
from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.metrics import MetricUnit

logger = Logger()
tracer = Tracer()
metrics = Metrics()

MOUNT_PATH = os.environ["MOUNT_PATH"]

@logger.inject_lambda_context
@tracer.capture_lambda_handler
@metrics.log_metrics(capture_cold_start_metric=True)
def lambda_handler(event, context):
    inbox = os.path.join(MOUNT_PATH, "inbox")
    processed = os.path.join(MOUNT_PATH, "processed")
    os.makedirs(processed, exist_ok=True)

    count = 0
    for filename in os.listdir(inbox):
        src = os.path.join(inbox, filename)

        with open(src, "r") as f:
            content = f.read()
        result = {"word_count": len(content.split()), "lines": content.count("\n")}

        with open(os.path.join(processed, f"{filename}.meta.json"), "w") as f:
            json.dump(result, f)

        os.rename(src, os.path.join(processed, filename))
        count += 1

    metrics.add_metric(name="FilesProcessed", unit=MetricUnit.Count, value=count)
    logger.info("batch complete")

There is no boto3 import, no /tmp management, and no copy-then-delete dance. Powertools makes it easy to add structured logging, tracing, and EMF metrics - the decorators above wire all three into this handler. The rename returns instantly from the NFS client's perspective. The code is materially shorter and maps more directly to the workload.

One caveat: "instant" means instant from your code's perspective. Under the hood, S3 Files still has to copy + delete the S3 object to implement the rename - general-purpose S3 buckets have no native rename operation. (S3 Express One Zone directory buckets do have a RenameObject API, but S3 Files works with general-purpose buckets.) For single files, this happens fast enough to be invisible. For directory renames across thousands of objects, the S3-side sync can take a long time - AWS documentation warns about performance impact for large recursive rename operations on prefixes with many objects. Your NFS client sees the rename as complete immediately, but S3 API consumers see the old key until the background sync finishes.

This is not just cleaner code - it's a fundamentally different model. S3 remains the authoritative data store; the file system is a synchronized view. Your Lambda function sees files and directories. S3 sees objects and prefixes. Both are looking at the same data.

Building the Infrastructure with Terraform

Good news: the Terraform AWS provider shipped native S3 Files resources in v6.40.0 on April 8, 2026 - just one day after S3 Files went GA. The new resources are aws_s3files_file_system, aws_s3files_mount_target, and aws_s3files_access_point, plus corresponding data sources and aws_s3files_file_system_policy for resource-based policies.

The S3 Bucket (Versioning is Mandatory)

S3 Files requires bucket versioning to be enabled. This is how it tracks the relationship between file-system state and S3 object versions. The full bucket setup also includes SSE-S3 encryption (explicit, even though it's the default for new buckets), a public access block, and a bucket policy enforcing TLS-only access:

resource "aws_s3_bucket" "docs" {
  bucket = "${var.project_name}-${var.environment}-docs"
}

resource "aws_s3_bucket_versioning" "docs" {
  bucket = aws_s3_bucket.docs.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "docs" {
  bucket = aws_s3_bucket.docs.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "docs" {
  bucket                  = aws_s3_bucket.docs.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# Disable ACLs - bucket-owner-enforced is the default for new buckets, but
# being explicit prevents readers from relying on defaults they may not understand.
resource "aws_s3_bucket_ownership_controls" "docs" {
  bucket = aws_s3_bucket.docs.id
  rule {
    object_ownership = "BucketOwnerEnforced"
  }
}

# Versioning is mandatory for S3 Files, so without lifecycle cleanup old
# versions accumulate silently during repeated benchmark runs.
resource "aws_s3_bucket_lifecycle_configuration" "docs" {
  bucket = aws_s3_bucket.docs.id
  rule {
    id     = "expire-noncurrent-versions"
    status = "Enabled"
    noncurrent_version_expiration {
      noncurrent_days = 7
    }
  }
}

resource "aws_s3_bucket_policy" "docs" {
  bucket = aws_s3_bucket.docs.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Sid       = "DenyNonTLS"
      Effect    = "Deny"
      Principal = "*"
      Action    = "s3:*"
      Resource  = [aws_s3_bucket.docs.arn, "${aws_s3_bucket.docs.arn}/*"]
      Condition = { Bool = { "aws:SecureTransport" = "false" } }
    }]
  })
}

Production hardening notes: For workloads that should stay private inside the VPC, you can go beyond TLS-only and restrict bucket access to your S3 VPC endpoint using aws:sourceVpce or aws:sourceVpc conditions in the bucket policy. This can prevent bucket access except through your approved VPC or VPC endpoint, even when credentials are otherwise valid. For SSE-KMS encrypted buckets, the S3 Files service role would also need kms:GenerateDataKey, kms:Encrypt, kms:Decrypt, kms:ReEncryptFrom, and kms:ReEncryptTo scoped with kms:ViaService = s3.<region>.amazonaws.com. This demo uses SSE-S3 (AES256), so KMS permissions are not needed here.

The S3 Files Service Role

S3 Files needs an IAM role it can assume to read and write your bucket. This is separate from your Lambda's execution role. First surprise: the service principal is elasticfilesystem.amazonaws.com, not s3files.amazonaws.com. S3 Files is built on EFS, and the trust policy has to name the underlying service. If you guess the obvious name, CreateRole fails with MalformedPolicyDocument: Invalid principal.

resource "aws_iam_role" "s3files_service" {
  name = "${var.project_name}-${var.environment}-s3files-service"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Sid       = "AllowS3FilesAssumeRole"
      Effect    = "Allow"
      Principal = { Service = "elasticfilesystem.amazonaws.com" }
      Action    = "sts:AssumeRole"
      Condition = {
        StringEquals = {
          "aws:SourceAccount" = data.aws_caller_identity.current.account_id
        }
        ArnLike = {
          "aws:SourceArn" = "arn:aws:s3files:${data.aws_region.current.region}:${data.aws_caller_identity.current.account_id}:file-system/*"
        }
      }
    }]
  })
}

resource "aws_iam_role_policy" "s3files_bucket_access" {
  name = "s3-bucket-access"
  role = aws_iam_role.s3files_service.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = [
        "s3:ListBucket", "s3:ListBucketVersions",
        "s3:GetBucketLocation", "s3:GetBucketVersioning",
        "s3:AbortMultipartUpload", "s3:ListMultipartUploadParts",
        "s3:GetObject", "s3:GetObjectVersion", "s3:GetObjectTagging", "s3:GetObjectVersionTagging",
        "s3:PutObject", "s3:PutObjectTagging",
        "s3:DeleteObject", "s3:DeleteObjectVersion"
      ]
      Resource = [aws_s3_bucket.docs.arn, "${aws_s3_bucket.docs.arn}/*"]
      Condition = {
        StringEquals = {
          "aws:ResourceAccount" = data.aws_caller_identity.current.account_id
        }
      }
    }]
  })
}

The role also needs EventBridge permissions - this is the mechanism behind S3-to-NFS synchronization. S3 Files creates EventBridge rules (prefixed DO-NOT-DELETE-S3-Files*) to detect out-of-band bucket changes. Without these, S3-side writes never propagate to the NFS mount:

resource "aws_iam_role_policy" "s3files_eventbridge" {
  name = "eventbridge-sync"
  role = aws_iam_role.s3files_service.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "EventBridgeManage"
        Effect = "Allow"
        Action = [
          "events:PutRule", "events:PutTargets",
          "events:DeleteRule", "events:DisableRule",
          "events:EnableRule", "events:RemoveTargets"
        ]
        Resource = "arn:aws:events:*:*:rule/DO-NOT-DELETE-S3-Files*"
        Condition = {
          StringEquals = {
            "events:ManagedBy" = "elasticfilesystem.amazonaws.com"
          }
        }
      },
      {
        Sid    = "EventBridgeRead"
        Effect = "Allow"
        Action = [
          "events:DescribeRule", "events:ListRules",
          "events:ListRuleNamesByTarget", "events:ListTargetsByRule"
        ]
        Resource = "arn:aws:events:*:*:rule/*"
      }
    ]
  })
}

If you use SSE-KMS encryption on the bucket, you'd also need kms:GenerateDataKey, kms:Encrypt, kms:Decrypt, kms:ReEncryptFrom, and kms:ReEncryptTo scoped with kms:ViaService = s3.<region>.amazonaws.com. This demo uses SSE-S3 (AES256), so KMS permissions aren't needed.

The aws:SourceArn condition and the full set of object/multipart/EventBridge actions are documented in the S3 Files prerequisites. The biggest risk from an incomplete policy isn't a permission error - it's silent failure. Missing EventBridge permissions mean the sync rules never get created, and S3-side changes simply don't appear on the mount. Missing multipart permissions cause large-file uploads to leak incomplete parts.

Creating the File System, Mount Targets, and Access Point

The aws_s3files_file_system resource takes just a bucket ARN and the service role ARN:

resource "aws_s3files_file_system" "docs" {
  bucket   = aws_s3_bucket.docs.arn
  role_arn = aws_iam_role.s3files_service.arn
}

Mount targets go in each subnet where your Lambda runs. One per AZ:

resource "aws_s3files_mount_target" "az" {
  count = length(var.private_subnet_ids)

  file_system_id  = aws_s3files_file_system.docs.id
  subnet_id       = var.private_subnet_ids[count.index]
  security_groups = [aws_security_group.mount_target.id]
}

Mount targets take about 5 minutes to create. Terraform's create timeout handles that wait - but there's a trap: the provider returns once the API call completes, which happens before the target reaches the available lifecycle state. If you create a Lambda that references the access point immediately after, CreateFunction fails with not all are in the available life cycle state yet. The fix is an explicit wait between mount targets and downstream consumers:

resource "time_sleep" "wait_for_mount_targets" {
  depends_on      = [aws_s3files_mount_target.az]
  create_duration = "90s"
}

resource "aws_s3files_access_point" "lambda" {
  file_system_id = aws_s3files_file_system.docs.id
  depends_on     = [time_sleep.wait_for_mount_targets]

  # DEMO SHORTCUT: uid 0:0 avoids ownership collisions during the side-by-side
  # comparison. In production, prefer a scoped access point path with a non-root
  # UID/GID (e.g., uid=1000), or grant s3files:ClientRootAccess on the Lambda
  # role instead. AWS's Lambda console defaults to UID/GID 1000:1000 with
  # root_directory.path = "/lambda" for good reason.
  posix_user {
    uid = 0
    gid = 0
  }

  root_directory {
    path = "/"
  }
}

The access point controls the POSIX UID/GID that all NFS operations execute as. The choice of 0:0 here is a demo compromise, not a recommendation - I'll explain the tradeoffs and better alternatives in the "Things to Look Out For" section.

Finally, add an aws_s3files_file_system_policy - the resource-based policy on the file system itself (equivalent to a bucket policy). Without this, any principal with s3files:ClientMount in their IAM policy can mount your file system:

resource "aws_s3files_file_system_policy" "docs" {
  file_system_id = aws_s3files_file_system.docs.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid       = "AllowMountFromKnownRoles"
        Effect    = "Allow"
        Principal = { AWS = [var.lambda_role_arn, var.ec2_role_arn] }
        Action    = ["s3files:ClientMount", "s3files:ClientWrite"]
        Resource  = aws_s3files_file_system.docs.arn
      },
      {
        Sid       = "EnforceTLS"
        Effect    = "Deny"
        Principal = "*"
        Action    = "s3files:*"
        Resource  = aws_s3files_file_system.docs.arn
        Condition = { Bool = { "aws:SecureTransport" = "false" } }
      }
    ]
  })
}

The VPC (No NAT Gateway Needed)

S3 Files requires your Lambda to be in a VPC - the NFS mount targets live inside your VPC subnets. But you don't need a NAT Gateway (which costs about $35/month). Instead, use a free S3 Gateway VPC endpoint for S3 API traffic:

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_support   = true   # Required for VPC endpoints
  enable_dns_hostnames = true
}

resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]
}

# Free S3 Gateway endpoint - no NAT gateway needed
resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${data.aws_region.current.region}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = [aws_route_table.private.id]
}

Security groups allow NFS traffic (TCP 2049) between the Lambda and mount targets:

# Lambda can reach mount targets on NFS port
resource "aws_vpc_security_group_egress_rule" "lambda_to_nfs" {
  security_group_id            = aws_security_group.lambda_after.id
  referenced_security_group_id = aws_security_group.mount_target.id
  from_port                    = 2049
  to_port                      = 2049
  ip_protocol                  = "tcp"
}

# Mount targets accept NFS from Lambda
resource "aws_vpc_security_group_ingress_rule" "nfs_from_lambda" {
  security_group_id            = aws_security_group.mount_target.id
  referenced_security_group_id = aws_security_group.lambda_after.id
  from_port                    = 2049
  to_port                      = 2049
  ip_protocol                  = "tcp"
}

The Lambda Function with S3 Files Mount

The Lambda configuration uses the same file_system_config block as EFS. The key additions are the VPC config and the S3 Files-specific IAM permissions:

resource "aws_lambda_function" "processor_after" {
  function_name = "${var.project_name}-${var.environment}-after"
  runtime       = "python3.14"
  architectures = ["arm64"]
  memory_size   = 512    # >= 512MB enables direct S3 read optimization
  timeout       = 300
  handler       = "handler.lambda_handler"

  vpc_config {
    subnet_ids         = var.private_subnet_ids
    security_group_ids = [var.lambda_sg_id]
  }

  file_system_config {
    arn              = var.access_point_arn    # S3 Files access point
    local_mount_path = "/mnt/docs"            # Must start with /mnt/
  }

  environment {
    variables = {
      MOUNT_PATH = "/mnt/docs"
    }
  }
}

The Lambda execution role needs S3 Files mount permissions and S3 read permissions for the direct-read optimization:

resource "aws_iam_role_policy" "s3files_mount" {
  name = "s3files-mount"
  role = aws_iam_role.execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["s3files:ClientMount", "s3files:ClientWrite"]
      Resource = var.access_point_arn
    }]
  })
}

# Required for the >=1 MiB direct-read bypass (streams from S3 at up to 3 GB/s).
# Without this, reads silently fall back to the cached path - the mount works
# but you lose the throughput optimization and pay S3 Files access charges.
resource "aws_iam_role_policy" "s3_direct_read" {
  name = "s3-direct-read"
  role = aws_iam_role.execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["s3:GetObject", "s3:GetObjectVersion"]
      Resource = "${var.bucket_arn}/*"
    }]
  })
}

Note: s3files:ClientMount is required for all access. s3files:ClientWrite is only needed for read-write mounts. s3files:ClientRootAccess lets a non-root access point UID operate on root-owned entries (see the Access Point Ownership section below - it's the cleanest fix for mixed S3-API/NFS workflows). The s3:GetObject/s3:GetObjectVersion permissions are technically optional, but without them the direct-read bypass doesn't activate and your >=512MB memory setting buys you nothing.

Performance Comparison

I deployed both Lambda functions and ran them against 20 medium-sized text files (500-2000 words each) across 3 runs. The benchmark script seeds each approach into its own S3 prefix, invokes the relevant Lambda, and collects timing breakdowns from both handlers.

make benchmark

Actual results from a 20-file, 3-run benchmark (arm64 Lambda, 512MB, us-east-1):

Metric	Before (S3 API)	After (S3 Files)	Speedup
List files	80ms (min 74, max 92)	152ms (min 8, max 440)	0.5x
Read/Download (20 files)	991ms (min 920, max 1096)	139ms (min 124, max 146)	7.1x
Process	5ms	3ms	1.7x
Write metadata (20 files)	1788ms (min 1574, max 2054)	256ms (min 249, max 266)	7.0x
Move/rename (20 files)	530ms (min 465, max 647)	114ms (min 112, max 116)	4.6x
Lambda total	3394ms (min 3037, max 3878)	664ms (min 518, max 948)	5.1x
Wall clock (including invoke)	3505ms	1132ms	3.1x

The Lambda-internal win is about 5x. Wall clock narrows the gap because the after-Lambda pays a VPC cold start penalty on its first invocation in each run (2.2s wall on run 1, ~600ms on runs 2-3 once the ENI is warm). For batch workloads you'd amortize that across many files; for sporadic triggers you'd feel it every time.

The single non-win - list time - is counterintuitive but worth calling out. os.listdir over NFS had a cold-run outlier of 440ms (vs ~80ms for a warm ListObjectsV2 call). I didn't chase this down, but it looks like metadata that hasn't been touched recently isn't in the S3 Files cache yet and needs to be hydrated from S3 on first access. After warmup, listdir settles at 8ms - 10x faster than the S3 API.

The biggest wins are in small file reads (no per-object HTTP round trip), writes (no multipart setup for small files), and rename (a single inode operation vs CopyObject + DeleteObject).

The Lambda Managed Instances Connection

In my previous post on Lambda Managed Instances, I explored how high-memory Lambda functions unlock new workload patterns. S3 Files adds another dimension to this.

When your Lambda function has 512MB or more memory, S3 Files enables direct S3 read routing: reads of 1 MiB or larger bypass the file system's high-performance storage entirely and stream directly from S3 at up to 3 GB/s per client (that's a throughput ceiling, not a typical number - actual throughput depends on file size, network, and concurrency). These direct reads don't incur S3 Files access charges - you only pay standard S3 GET pricing. (Your Lambda execution role needs s3:GetObject and s3:GetObjectVersion on the bucket for this to work - without them, reads silently fall back to the cached path.)

There's a separate threshold at play too: files smaller than 128 KiB are asynchronously imported into the high-performance storage on first access (a prefetch optimization, not a bypass). Files between 128 KiB and 1 MiB get metadata imported but data is fetched on demand. This creates a three-tier read architecture:

Tiny files (under 128 KiB) (configs, metadata, indexes): prefetched into S3 Files cache, sub-millisecond on subsequent reads
Mid-size files (128 KiB to 1 MiB): fetched on demand from the cache or S3, depending on access pattern
Large files (1 MiB and above) (datasets, models, media): streamed directly from S3 at 3 GB/s, skipping the cache entirely

The 128 KiB import threshold is tunable per file system via aws_s3files_synchronization_configuration in Terraform (not shown in the demo). The 1 MiB direct-read bypass is not tunable.

For data-intensive Lambda workloads, combining Managed Instances (multi-concurrency, high memory) with S3 Files (mounted file system, direct S3 read bypass) is a compelling alternative to containerized processing.

The Three-Way EC2 Comparison: S3 API vs S3 Files vs Mountpoint

The Lambda benchmark above covers the serverless use case, but it doesn't include Mountpoint for Amazon S3 - AWS's FUSE-based file-system client. Mountpoint is widely used for analytics and ML workloads, so it's a natural comparison. There's just one problem: Mountpoint can't run on Lambda. It's FUSE-based, and Lambda's Firecracker microVM doesn't expose /dev/fuse or grant CAP_SYS_ADMIN - both required for userspace file-system mounts. S3 Files sidesteps this entirely by using NFS, which Lambda natively supports through its existing EFS mount infrastructure.

So for the three-way comparison, I added a Graviton EC2 instance (c7g.large, arm64, in the same VPC) with both S3 Files and Mountpoint mounted, plus direct S3 API access via boto3. Same bucket, same data, three different interfaces.

Large-Directory Walk (10,000 Small Files)

Seed 10,000 small text files under a single prefix, then enumerate every entry and stat each one:

Approach	Mean	Min	Max
S3 Files (NFS `os.listdir` + `os.stat`)	905ms	891ms	924ms
S3 API (`ListObjectsV2`, paginated)	1,666ms	1,637ms	1,698ms
Mountpoint (FUSE `os.listdir` + `os.stat`)	175,847ms	171,168ms	179,002ms

S3 Files is 1.8x faster than the S3 API. Mountpoint is 194x slower - nearly three minutes for 10,000 entries. This is Mountpoint's known weakness: it makes a ListObjectsV2 call per directory and then individual HeadObject calls for stat(), with no prefetching or metadata caching. If your workload involves browsing or enumerating directories, Mountpoint is the wrong tool.

Large-File Throughput (5 x 1 GiB Random Binary)

Seed five 1 GiB random binary files, stream-read each one into a SHA-256 hash, write the digest back:

Approach	Read time (5 GiB)	Read throughput	Write time
Mountpoint (FUSE)	11,158ms	459 MiB/s	1,469ms
S3 Files (NFS)	32,356ms	161 MiB/s	71ms
S3 API (`GetObject` stream)	129,228ms	43 MiB/s	151ms

For large sequential reads, Mountpoint dominates at 459 MiB/s - nearly 3x S3 Files and 10x the S3 API. This isn't an accident: Mountpoint splits each large read into parallel HTTP Range GET requests across multiple TCP connections, with aggressive read-ahead prefetching. A 1 GiB file read becomes many concurrent range fetches that saturate the network link. It's a purpose-built parallel download accelerator for large, sequential, read-heavy workloads (ML training data, analytics datasets, media processing).

S3 Files (161 MiB/s) goes through NFS 4.1/4.2 to a managed server that reads from its cache or S3 - the protocol framing and cache coherency tracking add overhead. The S3 API (43 MiB/s) is a single GetObject stream over one HTTP connection with no parallelism.

The same design that makes Mountpoint fast for large reads makes it very slow for directories: it has no metadata cache, so every stat() call becomes an individual HeadObject API call to S3. That's why 10,000 files takes 176 seconds.

Write time tells a similar story: Mountpoint takes 1,469ms to write five small digest files. S3 Files does it in 71ms. S3 API in 151ms. Mountpoint's FUSE-to-S3 translation adds high per-file overhead for small writes.

When to Use Which

The benchmark reveals that no single approach wins everywhere:

Use case	Best tool	Why
Interactive file operations (rename, create, list)	S3 Files	File-system semantics, metadata caching, rename instant from the NFS client's perspective (S3-side sync is async)
Large sequential reads (datasets, models, media)	Mountpoint	Highest throughput, zero software cost, no VPC needed
Serverless (Lambda)	S3 Files	Mountpoint can't run on Lambda at all
Simplest deployment (no VPC, no mounts)	S3 API	Slowest but zero infrastructure - works anywhere with IAM credentials
Directory-heavy workloads	S3 Files	In this benchmark, Mountpoint's per-entry overhead made large directory walks much slower

Things to Look Out For

S3 Files is impressive, but it's not magic. Here are the real-world constraints you need to know:

60-Second Commit Delay

S3 Files uses a "stage and commit" model. File-system writes are batched for approximately 60 seconds before committing to S3. Files you write are immediately visible through the NFS mount, but they won't appear in aws s3 ls or s3.list_objects_v2() for about a minute.

For the document processing use case, this is fine - the Lambda reads and writes through the mount, so consistency is maintained within the NFS view. But if you have a downstream process polling S3 directly for new objects, it will see a delay.

VPC Cold Starts

Putting Lambda in a VPC adds cold start latency. AWS has improved this significantly with Hyperplane ENI caching, but in this benchmark I observed roughly 1-2 seconds of additional cold start time compared to the non-VPC Lambda. For infrequently-invoked functions, this matters. For functions that process batches (like our document processor), the cold start is amortized across many files.

50 Million Object Limit

Each mounted file system supports up to 50 million objects. For most workloads this is generous, but if you're mounting a bucket with hundreds of millions of small objects, you'll need to scope the mount to a prefix. In Terraform, this is a creation-time argument on aws_s3files_file_system (not shown in the demo, which mounts the entire bucket). Via the CLI, use the --prefix flag on create-file-system.

Key Name Restrictions

S3 allows object keys that don't map cleanly to POSIX filenames. According to AWS documentation, keys with trailing slashes, path traversal patterns (../), or components longer than 255 characters will not appear in the file-system view. The objects remain accessible via the S3 API, but the file system won't show them. AWS recommends monitoring the CloudWatch ImportFailures metric to detect these cases, as there are no client-side errors.

Delete and Update Propagation

S3-side changes only propagate to the NFS mount for files whose data is currently in the high-performance storage (the "hot" cache). In testing, hot-file deletes via the S3 API remained readable on the mount for roughly 6-18 seconds before disappearing. Modifications followed the same pattern: the mount saw the stale version until the EventBridge notification arrived.

For files whose data has been expired out of the cache (cold files), S3-side changes don't propagate at all until the next NFS read, at which point S3 Files fetches the latest version from S3. So the 6-18 second range observed above is a hot-path number; cold-path updates are lazy and unbounded. If you're designing a pipeline that writes via the S3 API and reads via the mount, test both cases.

Access Point Ownership

This is the biggest surprise I hit, and it drove a design change in the demo.

Objects written through the NFS mount do carry POSIX ownership metadata - S3 Files stores it as user-defined S3 object metadata (file-permissions, file-owner, file-group, file-mtime) on every object it writes. But objects written via the S3 API - s3.put_object(), aws s3 cp, the before Lambda's boto3 calls - don't have that metadata. When S3 Files imports those API-written objects into the NFS view, they get default permissions: root:root (UID 0, GID 0) with mode 0644 for files and 0755 for directories. That asymmetry is the mechanism behind this issue: directories are traversable and readable by everyone (which is why the inbox reads worked), but only writable by root (which is why creating entries in processed/ failed). Those directories, incidentally, are just S3 prefixes materialized as zero-byte objects - which is why S3-API writes can create them as a side effect of PutObject and why they end up root-owned when imported.

The first time my after Lambda ran with posix_user { uid = 1000, gid = 1000 }, it failed with PermissionError: [Errno 13] Permission denied: '/mnt/docs/processed/.... The Lambda could read the inbox just fine, but it couldn't create anything under /mnt/docs/processed/ because S3 Files had reflected a previous before-Lambda PutObject into NFS as a root-owned directory.

Four ways out, ordered from best (least privilege) to most expedient:

Use a scoped access point path (recommended for production). Set root_directory.path = "/lambda-workspace" with creation_permissions { owner_uid = 1000, owner_gid = 1000, permissions = "755" } and posix_user { uid = 1000, gid = 1000 }. S3 Files creates that path owned by your UID, and the Lambda only sees its owned subtree. The tradeoff: every S3 object the Lambda needs to see must be keyed under lambda-workspace/..., and a raw aws s3 cp into any other prefix is invisible to the mount. This enforces least privilege at the access-point level.
Grant s3files:ClientRootAccess on the Lambda's IAM role. This lets a non-root UID (still posix_user { uid = 1000 }) perform operations against root-owned entries - including creating files inside root-owned directories imported from S3 - without running the entire Lambda as UID 0. It's the middle ground: keep least-privilege POSIX identity, elevate only for cross-boundary operations with S3-origin content. This permission is included in the AmazonS3FilesClientFullAccess managed policy, which is probably why I missed it - the demo's inline policy has only ClientMount + ClientWrite.
Avoid path collisions: have each S3-API-side producer write to a prefix the NFS client never writes into. The demo does this - the before Lambda writes to processed-before/ and the after Lambda writes to processed-after/ - so their outputs never fight over directory ownership.
Run as root (access point posix_user { uid = 0, gid = 0 }). The Lambda runs as "root" for NFS purposes and can write alongside S3-born files. This is what the demo uses because the side-by-side comparison needs both approaches to see the same bucket root. This is the opposite of least privilege - any NFS client can read, write, and delete anything on the mount. Last resort only.

If you're using S3 Files to replace an existing boto3 pipeline, plan this up front. Any prefix your NFS clients will write into should be created from the mount side first, or left entirely unpopulated from the S3 side. Anything written via PutObject will arrive in NFS as root-owned and block writes from non-root access points (unless you've granted ClientRootAccess).

Note: the demo pairs option 4 with an aws_s3files_file_system_policy that restricts which IAM principals can mount at all (deny-by-default, allow only the Lambda and EC2 benchmark roles, enforce TLS). If you use uid=0, this resource-based policy is your primary access control.

Related: don't pre-create "directory" marker objects (zero-byte inbox/, processed/, etc.) from Terraform. I had three aws_s3_object resources doing this and they turned out to be the exact cause of the ownership collision. The Lambda's os.makedirs(exist_ok=True) creates the directories over NFS with the correct access-point ownership - let it do its job.

S3-to-NFS Propagation Delay

Also worth knowing: writes go in both directions, but they don't propagate symmetrically. NFS writes commit to S3 on the 60-second schedule described above. S3 writes appear in the NFS view asynchronously via EventBridge notifications, which typically takes a few seconds but can take longer under load. If your benchmark seeds files via s3.put_object() and immediately invokes an NFS-mounted Lambda, the mount will see an empty inbox. The benchmark script in this project waits 60 seconds after S3-seeding to sidestep this.

Conflict resolution: if the same file is modified through both the NFS mount and the S3 API before synchronization completes, the S3 bucket version wins. The file-system copy is not silently overwritten - it gets moved to a .s3files-lost+found-<file-system-id> directory on the mount. Files in lost+found are not copied back to the S3 bucket and persist indefinitely on the file system, counting toward storage costs until explicitly deleted. This is important to understand for mixed API + file-system workflows: the S3 side is always authoritative, and your NFS edits may end up in lost+found if there's a race.

When NOT to Use S3 Files

S3 Files isn't always the right choice:

Read-only analytics at lowest cost: Mountpoint for S3 adds zero software cost and is optimized for sequential reads of large files. If you're running Spark, Presto, or ML training jobs that only read data, Mountpoint is cheaper and simpler (no VPC required).
Non-AWS or S3-compatible storage: s3fs-fuse works with MinIO, Ceph, and other S3-compatible object stores. S3 Files is AWS-only.
Existing EFS mounts: Lambda supports one file-system mount - EFS or S3 Files, not both. For any new build where the backing data lives in S3, prefer S3 Files over EFS (you skip the EFS-to-S3 sync problem entirely). Only stick with EFS if the function needs a shared writable file system that multiple Lambda invocations coordinate through simultaneously.
Latency-critical writes that must appear in S3 immediately: The 60-second commit delay means writes aren't visible to S3 API consumers right away. If you need sub-second S3 visibility, stick with direct S3 API calls.

Wrapping Up

S3 Files eliminates an entire category of boilerplate from AWS applications. The download-process-upload pattern that we've all written hundreds of times is no longer necessary. Your code just reads and writes files. The underlying storage happens to be S3.

The Terraform story is solid from day one - native provider resources shipped in v6.40.0, just one day after S3 Files went GA. Three resources (aws_s3files_file_system, aws_s3files_mount_target, aws_s3files_access_point) cover the full setup, and the file_system_config block on aws_lambda_function works identically to the existing EFS mount pattern.

All the code for this post - Terraform modules, Lambda handlers (with Powertools), the EC2 runner, benchmark scripts, and the Makefile - is available in the companion repository.

Cost Summary

For a demo deployment: approximately $75/month if you leave everything running. The EC2 instance (about $53/month) and SSM VPC interface endpoints (about $21/month, needed because the EC2 is in a private subnet with no NAT) are the bulk. Lambda and S3 costs are negligible. Stop the EC2 and run make destroy when done.

Cost note for production: S3 Files meters data reads and writes with a minimum of 32 KiB per operation, regardless of actual size. This benchmark's medium text files (500-2000 words) are above that threshold, so it didn't show up. But at scale with many tiny files - say 10,000 sub-1 KiB JSON configs read once each - you'd pay for 10,000 x 32 KiB = 320 MiB of reads, not 10 MiB. For small-file-heavy workloads, factor this into your cost model.

Key Takeaways

S3 Files provides file-system semantics on S3 via NFS v4.1/4.2 - read, write, rename, advisory file locking
Lambda functions with >=512MB memory get direct S3 read bypass for reads >=1 MiB (up to 3 GB/s ceiling) - but only if the execution role has s3:GetObject
No NAT Gateway needed - use S3 Gateway VPC endpoints (but add SSM interface endpoints for EC2)
Mountpoint can't run on Lambda (no FUSE) - S3 Files is the only file-system option for serverless
For large sequential reads, Mountpoint still wins (459 MiB/s vs 161 MiB/s) - it was purpose-built for throughput
For directory operations, Mountpoint is prohibitively slow (176s vs 0.9s for 10K entries) - use S3 Files
The 60-second commit delay (NFS to S3) and the async EventBridge propagation (S3 to NFS) are the two consistency boundaries you have to design around
Access point ownership interacts with S3-origin objects in ways that will surprise you - plan prefix ownership up front
The trust policy service principal is elasticfilesystem.amazonaws.com, not s3files.amazonaws.com - S3 Files is built on EFS
Native Terraform support shipped day one in AWS provider v6.40.0, but use a time_sleep between mount targets and Lambda to avoid lifecycle state races

Resources

Launching S3 Files, Making S3 Buckets Accessible as File Systems - AWS News Blog announcement
Amazon S3 Files Documentation - Official user guide
Configuring S3 Files Access for Lambda - Lambda-specific setup guide
S3 Files Getting Started Tutorial - Step-by-step walkthrough
Terraform aws_s3files_file_system Resource - Terraform provider docs
Terraform aws_s3files_mount_target Resource - Mount target configuration
Terraform aws_s3files_access_point Resource - Access point with POSIX user mapping
AWS S3 Files Stress Test - The Register's independent stress test with edge case findings
S3 Files vs Mountpoint vs s3fs-fuse Comparison - Detailed feature and performance comparison
Architecture Layers That S3 Files Eliminates - Architectural patterns analysis
Mountpoint for Amazon S3 - The read-heavy alternative for analytics workloads
Lambda Managed Instances with Terraform - High-memory Lambda patterns that complement S3 Files
Powertools for AWS Lambda - Best Practices By Default - Observability patterns for Lambda functions

Connect with me on X, Bluesky, LinkedIn, GitHub, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

Amazon Aurora DSQL: A Practical Guide to AWS's Distributed SQL Database

Darryl Ruggles — Sun, 19 Apr 2026 16:23:59 +0000

Architecture, features, Terraform setup, and real application code - April 2026

When AWS announced Aurora DSQL at re:Invent 2024, I was very interested. We had heard promises about distributed SQL databases before and I really wanted to try it out. I experimented with it locally for a while and then built the Kabob Store example on it. Fifteen months later, DSQL has gone from preview to general availability, expanded to 14 regions, and shipped a steady stream of features. It fills the gap between DynamoDB's serverless economics and Aurora PostgreSQL's SQL power - and it does it well.

This is my comprehensive look at where DSQL stands in April 2026: what it does, what it doesn't do yet, how to set it up with Terraform, and practical application code you can use today.

Why Aurora DSQL?

For years, the database decision on AWS looked like this:

Need serverless economics? DynamoDB. But learn single-table design and give up SQL.
Need SQL? RDS or Aurora PostgreSQL. But accept always-on costs, instance sizing, and 10-15 minute provisioning.
Need multi-Region? DynamoDB Global Tables. SQL wasn't an option without manual replication.

Aurora DSQL eliminates the tradeoff. Four things make it different:

Serverless to zero - No instances, no capacity planning. Zero DPU charges when idle. Provisions in under 60 seconds.
PostgreSQL compatible - Based on PostgreSQL 16. Use psql, psycopg2, pgx, JDBC - the drivers you already know.
Strongly consistent - Not eventually consistent. Snapshot isolation with linearizability. Readers always see committed data.
Active-active multi-Region - Two full regions with concurrent reads and writes. No leader, no failover, no replication lag on commit.

What is Aurora DSQL?

Aurora DSQL is a serverless, distributed SQL database that disaggregates every component of a traditional database engine. Unlike Aurora PostgreSQL (which separates storage from compute but keeps them coupled), DSQL breaks the database into six independently scaling components:

Query Processors (QPs) - Run customized PostgreSQL engines inside Firecracker MicroVMs. Handle SQL parsing, planning, and execution. Scale independently based on query load.
Adjudicators - Validate transactions at COMMIT time using Optimistic Concurrency Control (OCC). Stateless and reconstructible.
Journal - A Paxos-based distributed transaction log (same technology as MemoryDB). Provides cross-AZ and cross-Region durability.
Crossbar - Merges journal streams and publishes committed writes to storage replicas. Sits between the Journal and Storage layers, ensuring all storage replicas receive the same ordered stream of committed transactions.
Storage - MVCC storage replicas distributed across 3 AZs. Consume committed entries from the Crossbar. Scale independently.
Control Plane - Coordinates all components, handles cluster lifecycle and scaling.

Note: The official AWS User Guide describes these layers as "Relay and connectivity, Compute and databases, Transaction log/concurrency control/isolation, Storage, and Control plane." The component names used here (Query Processors, Adjudicators, Journal, Crossbar) come from Marc Brooker's architecture deep-dive series and the AWS Database Blog, which provide more implementation detail.

The key design achievement, as Marc Brooker (VP/Distinguished Engineer at AWS) explained in his DSQL blog series, is that cross-region latency is incurred only at COMMIT time, not per-statement. During a transaction, reads and writes execute locally on the Query Processor. Only when you commit does the system coordinate with the Adjudicator and Journal for conflict detection and durability. Read-only transactions need no validation, no persistence, and no cross-region coordination at all.

Core Concepts

Optimistic Concurrency Control (OCC) - DSQL doesn't use locks. Transactions proceed without blocking each other. At COMMIT, the Adjudicator checks for write-write conflicts. If two transactions modified the same rows, one succeeds and the other gets a serialization failure (SQLSTATE 40001). Your application retries the failed transaction. No deadlocks, ever.

Snapshot Isolation - Each transaction sees a consistent snapshot of the database as of its start time (tau_start). All reads within a transaction see the same data, regardless of concurrent commits by other transactions. Equivalent to PostgreSQL's REPEATABLE READ.

IAM Authentication - No database passwords. Period. Applications generate tokens using generate_db_connect_auth_token (for runtime DML) or generate_db_connect_admin_auth_token (for schema migrations only). Integrates with IAM roles, so your ECS tasks and Lambda functions authenticate using their execution role. Tokens default to 15 minutes but can be configured up to one week using the token-duration-secs parameter in the connectors and CLI.

Asynchronous Indexes - DSQL requires CREATE INDEX ASYNC (synchronous CREATE INDEX is not supported). The index builds asynchronously while transactions continue. You can monitor build progress through system catalog queries.

Single DDL Per Transaction - Each CREATE TABLE, ALTER TABLE, or CREATE INDEX statement needs its own transaction with an explicit commit before the next DDL statement.

Feature Timeline: From Preview to Production

DSQL has shipped features at a steady pace since launch. Here's what has been added:

Date	Feature
February 2026	DSQL Playground (browser-based, no AWS account needed), sequences and identity columns, Go/Ruby/Python (asyncpg)/Node.js (WebSocket) connectors, numeric index support, AI steering (Kiro Powers, Claude/Gemini/Codex Skills), DBeaver plugin, SQLTools VS Code driver, Tortoise ORM adapter, Flyway dialect, Prisma CLI tools, expanded to 14 regions (added Canada, Sydney, Melbourne)
December 2025	Cluster lifecycle management, enhanced PrivateLink (Direct Connect + VPC peering), PostgreSQL migration guide
November 2025	Query Editor in console, JupyterLab integration, Python and Node.js connectors, storage quota increased to 256 TiB
October 2025	Resource-based policies for fine-grained cluster access control
September 2025	JDBC connector for Java applications
August 2025	AWS Fault Injection Service (FIS) integration for chaos testing
May 2025	General Availability - CloudWatch monitoring, AWS Backup, KMS CMK encryption, CloudFormation support, PrivateLink, Views
December 2024	Preview launch at re:Invent (3 US regions)

Region Availability (April 2026)

DSQL is now available in 14 regions across 4 continents:

Continent	Regions
North America	us-east-1 (Virginia), us-east-2 (Ohio), us-west-2 (Oregon), ca-central-1 (Montreal), ca-west-1 (Calgary)
Europe	eu-central-1 (Frankfurt), eu-west-1 (Ireland), eu-west-2 (London), eu-west-3 (Paris)
Asia Pacific	ap-northeast-1 (Tokyo), ap-northeast-2 (Seoul), ap-northeast-3 (Osaka), ap-southeast-2 (Sydney), ap-southeast-4 (Melbourne)

Multi-Region Cluster Sets

Multi-Region clusters must stay within one geographic set:

US: us-east-1, us-east-2, us-west-2
Europe: eu-central-1, eu-west-1, eu-west-2, eu-west-3
Asia Pacific: ap-northeast-1, ap-northeast-2, ap-northeast-3

Canada (ca-central-1, ca-west-1), Sydney (ap-southeast-2), and Melbourne (ap-southeast-4) are available as single-region clusters only and are not part of any multi-Region set. This is a common gotcha for customers in those regions.

Cross-continent multi-Region clusters are not supported. For global data sync across continents, DynamoDB Global Tables remain the option.

DSQL vs the Alternatives

Feature	Aurora DSQL	Aurora PostgreSQL Serverless v2	DynamoDB
Query language	PostgreSQL SQL	PostgreSQL SQL	PartiQL / NoSQL API
Provisioning time	Under 60 seconds	10-15 minutes	Instant
Scales to zero	Yes (no DPU charges)	Yes (0 ACU with auto-pause, ~15s cold start)	Yes (on-demand mode)
Multi-Region	Active-active, strong consistency	Read replicas, eventual	Global Tables, eventual
Availability SLA	99.99% / 99.999% multi-Region	99.99%	99.99% / 99.999% global
Authentication	IAM only (no passwords)	IAM or passwords	IAM or passwords
Foreign keys	Not yet	Yes	No (NoSQL)
Stored procedures	Not yet	Yes	No
Max storage	256 TiB	128 TiB	Unlimited
Transaction limits	3,000 rows, 10 MiB, 5 min	Practical limits (memory, storage, lock timeouts)	100 items, 4 MB
Pricing model	Per DPU ($8/million)	Per ACU-hour ($0.12+)	Per RRU/WRU or provisioned

When to Use What

Choose DSQL when you need SQL with serverless economics, multi-Region strong consistency, or you're building new applications that benefit from zero infrastructure management.

Choose Aurora PostgreSQL when you need foreign keys, stored procedures, triggers, pgvector for AI embeddings, or you're running an existing PostgreSQL application that uses unsupported features. Aurora Serverless v2 now scales to 0 ACUs with auto-pause (since November 2024), so it also offers scale-to-zero economics - with the tradeoff of a ~15-second cold start on resume.

Choose DynamoDB when your data model fits key-value or document patterns naturally, you need sub-millisecond latency, cross-continent global replication, or unlimited throughput scaling.

Setting Up DSQL with Terraform

All the Terraform code below uses Terraform >= 1.11 and the AWS provider ~> 6.0 . The terraform-aws-modules/rds-aurora DSQL submodule requires Terraform >= 1.11 and provider >= 6.18. The complete examples are in the GitHub repo.

Single-Region Cluster

This is the simplest setup. One resource, 60 seconds to provision, automatically distributed across 3 AZs:

terraform {
  required_version = ">= 1.11"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"
    }
  }
}

data "aws_region" "current" {}

resource "aws_dsql_cluster" "main" {
  deletion_protection_enabled = false

  # For production, enable deletion protection and add a KMS CMK:
  # deletion_protection_enabled = true
  # kms_encryption_key          = aws_kms_key.dsql.arn

  tags = {
    Name        = "my-app-dsql"
    Environment = "dev"
  }
}

# DSQL has no "endpoint" attribute - construct it from the identifier
output "dsql_endpoint" {
  value = "${aws_dsql_cluster.main.identifier}.dsql.${data.aws_region.current.id}.on.aws"
}

output "dsql_arn" {
  value = aws_dsql_cluster.main.arn
}

That's it. No instance class, no storage allocation, no replica configuration. One resource gives you a PostgreSQL-compatible database with 99.99% availability.

Multi-Region Cluster with Terraform Module

For production workloads requiring 99.999% availability, use multi-Region clusters. The official terraform-aws-modules/rds-aurora module includes a DSQL submodule:

provider "aws" {
  region = "us-east-1"
}

provider "aws" {
  alias  = "secondary"
  region = "us-east-2"
}

module "dsql_primary" {
  source  = "terraform-aws-modules/rds-aurora/aws//modules/dsql"
  version = "~> 10.0"

  deletion_protection_enabled = false
  witness_region              = "us-west-2"
  create_cluster_peering      = true
  clusters                    = [module.dsql_secondary.arn]

  tags = {
    Name = "my-app-dsql-primary"
  }
}

module "dsql_secondary" {
  source  = "terraform-aws-modules/rds-aurora/aws//modules/dsql"
  version = "~> 10.0"

  providers = {
    aws = aws.secondary
  }

  deletion_protection_enabled = false
  witness_region              = "us-west-2"
  create_cluster_peering      = true
  clusters                    = [module.dsql_primary.arn]

  tags = {
    Name = "my-app-dsql-secondary"
  }
}

The module handles cluster peering automatically. One terraform apply creates:

Primary cluster in us-east-1 with full read/write endpoint
Secondary cluster in us-east-2 with full read/write endpoint
Witness region in us-west-2 for Journal-only quorum (no endpoint, no user access)
Bidirectional peering with synchronous replication

Both endpoints present a single logical database. Your application can read and write to either endpoint. Strong consistency across both regions with zero replication lag on commit.

If you prefer using the native aws_dsql_cluster resource directly instead of the module, the multi-Region interface uses multi_region_properties with witness_region - see the commented-out Option B in the dsql-multi-region.tf example. Also note that AWS provider 6.x introduced per-resource region attributes, which can eliminate the need for provider aliases in some configurations.

IAM Authentication Policy

DSQL uses two IAM permission levels. Use the right one for each role:

dsql:DbConnect - Generates tokens for connecting with custom database roles. Use this for application runtime.
dsql:DbConnectAdmin - Generates tokens for connecting as the admin database user (full DDL + DML). Use this only for schema migrations and admin tasks.

Note that the DDL/DML restriction is enforced at the database role level, not the IAM layer. DbConnect generates a token that can only authenticate as a custom role (not admin), and custom roles only have the permissions you grant them. DbConnectAdmin generates a token that authenticates as admin, which has full privileges. AWS's security best practices are clear: don't use the admin role for everyday operations. Create separate IAM roles and custom database roles for application access.

# Application runtime policy - DML only (least privilege)
data "aws_iam_policy_document" "dsql_app_runtime" {
  statement {
    effect    = "Allow"
    actions   = ["dsql:DbConnect"]
    resources = [aws_dsql_cluster.main.arn]
  }
}

# Admin/migration policy - DDL + DML (for CI/CD pipelines, not app runtime)
data "aws_iam_policy_document" "dsql_admin" {
  statement {
    effect    = "Allow"
    actions   = ["dsql:DbConnectAdmin"]
    resources = [aws_dsql_cluster.main.arn]
  }
}

# ECS task role for application runtime - uses DbConnect, NOT DbConnectAdmin
resource "aws_iam_role" "app_task" {
  name = "my-app-task-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "ecs-tasks.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy" "app_dsql" {
  name   = "dsql-runtime-access"
  role   = aws_iam_role.app_task.id
  policy = data.aws_iam_policy_document.dsql_app_runtime.json
}

# Separate role for schema migrations (CI/CD pipeline, not the running app)
resource "aws_iam_role" "migration_role" {
  name = "my-app-migration-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "ecs-tasks.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy" "migration_dsql" {
  name   = "dsql-admin-access"
  role   = aws_iam_role.migration_role.id
  policy = data.aws_iam_policy_document.dsql_admin.json
}

Always scope DSQL permissions to the specific cluster ARN. No wildcard resources. Your running application should never have DbConnectAdmin - reserve that for migration tasks.

Custom Database Role (Least Privilege at the Database Layer)

IAM controls which token type you can generate, but you should also avoid connecting as admin for everyday operations. Create a custom database role and map it to an IAM identity:

-- Connect as admin (one-time setup via DbConnectAdmin)
CREATE ROLE app_role WITH LOGIN;
GRANT USAGE ON SCHEMA public TO app_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_role;

-- Map the IAM role ARN to the custom database role (DSQL-specific syntax)
AWS IAM GRANT app_role TO 'arn:aws:iam::123456789012:role/my-app-task-role';

-- To revoke later:
-- AWS IAM REVOKE app_role FROM 'arn:aws:iam::123456789012:role/my-app-task-role';

Then connect as the custom role in your application code:

conn = psycopg2.connect(
    host=cluster_endpoint,
    port=5432,
    database="postgres",
    user="app_role",  # Custom role, not admin
    password=token,   # Token from generate_db_connect_auth_token
    sslmode="require",
)

This completes the least-privilege story at both layers: IAM controls token generation (DbConnect vs DbConnectAdmin), and the database role controls what SQL the connection can execute.

PrivateLink (Production)

For production workloads, keep database traffic off the public internet using VPC endpoints:

resource "aws_vpc_endpoint" "dsql" {
  vpc_id              = aws_vpc.main.id
  service_name        = aws_dsql_cluster.main.vpc_endpoint_service_name
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.dsql_endpoint.id]
  private_dns_enabled = true
}

resource "aws_security_group" "dsql_endpoint" {
  name_prefix = "${var.project_name}-dsql-endpoint-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "Allow all outbound (required for VPC endpoint communication)"
  }
}

With private_dns_enabled = true, your application connects using the same cluster endpoint - no code changes needed. For connections from on-premises via Direct Connect without private DNS, use the amzn-cluster-id connection parameter.

Application Code: Python

The examples below use Python 3.13+ with psycopg2 2.9.11 and boto3. The full example is in dsql_connection.py.

Connection with IAM Auth

import boto3
import psycopg2
from psycopg2.extras import RealDictCursor

client = boto3.client("dsql", region_name="us-east-1")
cluster_endpoint = f"{cluster_id}.dsql.us-east-1.on.aws"

# Token method must match the user:
# - user="admin" -> generate_db_connect_admin_auth_token (DDL + DML)
# - custom role  -> generate_db_connect_auth_token (DML only)
token = client.generate_db_connect_admin_auth_token(cluster_endpoint, "us-east-1")

conn = psycopg2.connect(
    host=cluster_endpoint,
    port=5432,
    database="postgres",
    user="admin",  # For production, use a custom database role - see "Custom Database Role" section
    password=token,
    sslmode="require",
    cursor_factory=RealDictCursor,
)

Or use the official connector (pip install aurora-dsql-python-connector, v0.2.6+) which handles token refresh automatically:

from aurora_dsql_python_connector import connect

conn = connect(
    cluster_endpoint="abc123.dsql.us-east-1.on.aws",
    region="us-east-1",
)

The OCC Retry Pattern

This is the most important pattern for DSQL applications. Since DSQL uses Optimistic Concurrency Control instead of locks, write transactions can fail at COMMIT when concurrent modifications conflict:

import time
import psycopg2.errors

def with_occ_retry(func, max_retries=3, base_delay=0.1):
    """Retry wrapper for OCC conflicts (SQLSTATE 40001)."""
    for attempt in range(max_retries):
        try:
            return func()
        except psycopg2.errors.SerializationFailure:
            if attempt == max_retries - 1:
                raise
            delay = base_delay * (2 ** attempt)
            time.sleep(delay)

def create_order(customer_email, items, total):
    def _do_insert():
        conn = db.get_connection()
        cur = conn.cursor()
        cur.execute("""
            INSERT INTO orders (customer_email, items, total_amount)
            VALUES (%s, %s, %s)
            RETURNING *
        """, (customer_email, json.dumps(items), total))
        result = cur.fetchone()
        conn.commit()
        return dict(result)

    return with_occ_retry(_do_insert)

Key points about OCC:

Read-only transactions never conflict - they don't need retry logic
OCC conflicts are SQLSTATE 40001 (serialization_failure)
Use exponential backoff to avoid retry storms
Design transactions to be small and fast to minimize conflict windows
Avoid hot-spot writes (e.g., incrementing a single counter row from many threads)

Schema Setup with DDL Limits

def create_tables():
    conn = db.get_connection()
    cur = conn.cursor()

    # One DDL per transaction - commit before next DDL
    cur.execute("""
        CREATE TABLE IF NOT EXISTS products (
            id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
            name VARCHAR(200) NOT NULL,
            price NUMERIC(10, 2) NOT NULL,
            category VARCHAR(50),
            created_at TIMESTAMPTZ DEFAULT now()
        )
    """)
    conn.commit()  # Must commit before next DDL

    cur.execute("""
        CREATE TABLE IF NOT EXISTS orders (
            id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
            customer_email VARCHAR(255) NOT NULL,
            items TEXT NOT NULL,
            total_amount NUMERIC(10, 2) NOT NULL,
            status VARCHAR(20) DEFAULT 'pending',
            created_at TIMESTAMPTZ DEFAULT now()
        )
    """)
    conn.commit()  # Separate transaction for each DDL

Sequences and Identity Columns (New - February 2026)

# Using identity columns for auto-incrementing IDs
cur.execute("""
    CREATE TABLE IF NOT EXISTS audit_log (
        id BIGINT GENERATED ALWAYS AS IDENTITY (CACHE 65536) PRIMARY KEY,
        event_type VARCHAR(50) NOT NULL,
        payload TEXT,
        created_at TIMESTAMPTZ DEFAULT now()
    )
""")
conn.commit()

# Or use sequences directly
cur.execute("CREATE SEQUENCE IF NOT EXISTS invoice_seq START 1000 CACHE 65536")
conn.commit()

cur.execute("SELECT nextval('invoice_seq')")
next_invoice = cur.fetchone()["nextval"]

Application Code: Node.js

The examples below use Node.js 24.x LTS with @aws-sdk/dsql-signer and pg 8.20+. The full example is in dsql-connection.mjs. You can also use the official connector @aws/aurora-dsql-node-postgres-connector (v0.1.8+) which wraps pg with automatic IAM auth.

Connection with AWS SDK Signer

import { DsqlSigner } from "@aws-sdk/dsql-signer";
import pg from "pg";

const signer = new DsqlSigner({
  hostname: "abc123.dsql.us-east-1.on.aws",
  region: "us-east-1",
});

// Token method matches the user:
// - "admin" -> getDbConnectAdminAuthToken (DDL + DML)
// - custom role -> getDbConnectAuthToken (DML only)
const token = await signer.getDbConnectAdminAuthToken();

const pool = new pg.Pool({
  host: "abc123.dsql.us-east-1.on.aws",
  port: 5432,
  database: "postgres",
  user: "admin",
  password: token,
  ssl: true,
});

OCC Retry in Node.js

async function withOccRetry(pool, txnFn, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const client = await pool.connect();
    try {
      await client.query("BEGIN");
      const result = await txnFn(client);
      await client.query("COMMIT");
      return result;
    } catch (err) {
      await client.query("ROLLBACK").catch(() => {});
      if (err.code === "40001" && attempt < maxRetries - 1) {
        await new Promise((r) => setTimeout(r, 100 * 2 ** attempt));
        continue;
      }
      throw err;
    } finally {
      client.release();
    }
  }
}

// Usage
const order = await withOccRetry(pool, async (client) => {
  const result = await client.query(
    `INSERT INTO orders (customer_email, items, total_amount)
     VALUES ($1, $2, $3) RETURNING *`,
    [email, JSON.stringify(items), total]
  );
  return result.rows[0];
});

Multi-Region Application Architecture

For applications that need 99.999% availability and low-latency reads from multiple regions, deploy your application stack in each DSQL region with Route53 latency-based routing:

The architecture uses:

Route53 latency-based routing to direct users to the nearest region
CloudFront for static asset caching and edge termination
ECS Fargate running application containers in each region
Aurora DSQL with active-active clusters in both regions and a witness for quorum

Both DSQL endpoints present a single logical database. East Coast users connect to us-east-1, West Coast users connect to us-east-2 (or us-west-2 if available as a full endpoint) - both reading and writing the same strongly consistent data. The witness region in us-west-2 stores only encrypted Journal entries for quorum, with no user endpoint.

This is conceptually similar to DynamoDB Global Tables, but with full PostgreSQL SQL support and strong consistency instead of eventual consistency.

Database Limits to Know

DSQL has intentional limits that prevent tail latency and keep the system predictable. These aren't bugs - they're design choices:

Limit	Value	Why It Matters
Rows per transaction	3,000	Keeps OCC conflict windows small. Batch large inserts.
Transaction size	10 MiB	Prevents oversized commits from impacting the Journal.
Transaction duration	5 minutes	Forces short, focused transactions. No long-running locks (because there are no locks).
Connection duration	60 minutes	Aligns with IAM token lifecycle. Reconnect periodically.
Max connections	10,000 per cluster	Configurable via Service Quotas.
Connection rate	100/second (1,000 burst)	Not configurable. Critical for Lambda cold-start scenarios.
Tables per database	1,000	One database per cluster.
Schemas per database	10	Not configurable.
Indexes per table	24	Including primary key.
Max row size	2 MiB	Individual column max is 1 MiB.
Max storage	256 TiB (with quota increase)	Default is 10 TiB.
Sequences	5,000 per database	Added February 2026.
Views	5,000 per database	Added at GA, May 2025.

Pricing

DSQL uses a DPU (Distributed Processing Unit) billing model that covers all database activity - compute, I/O, and transaction processing - in a single metric.

DPU rate: $8 per million DPUs (us-east-2)
Storage: $0.33 per GB-month (pay for one logical copy per region)
Multi-Region writes: Additional DPU charges equal to originating write DPUs
Free tier: 100,000 DPUs + 1 GB storage per month (roughly 700K TPC-C equivalent transactions)
Scales to zero: No DPU charges when idle

Cost Comparison for a Modest Workload

For an application processing 1,000 transactions per hour, 10 GB storage:

Service	Monthly Cost
Aurora DSQL (single region)	~$50-80/month
Aurora DSQL (idle dev environment)	~$3/month (storage only)
Aurora PostgreSQL Serverless v2	~$90-120/month active, or storage-only when paused at 0 ACU (~15s cold start on resume)
RDS PostgreSQL (db.t3.medium)	~$60-80/month (runs 24/7)
DynamoDB (on-demand, equivalent)	~$30-50/month

Both DSQL and Aurora Serverless v2 can now scale to zero. The difference: DSQL resumes instantly with no cold start, while Aurora Serverless v2 takes approximately 15 seconds to resume from a paused state. For development environments with intermittent traffic, both cost pennies when idle. For production workloads that need instant response times, DSQL's zero cold start matters. DSQL is also eligible for Database Savings Plans for predictable workloads.

You can monitor DPU breakdown in CloudWatch under the AWS/AuroraDSQL namespace: ComputeDPU, ReadDPU, WriteDPU, and MultiRegionWriteDPU.

Developer Experience and Tooling

DSQL's tooling ecosystem has grown quickly:

Connectors (official, handle IAM auth automatically):

Python: aurora-dsql-python-connector v0.2.6 - wraps psycopg, psycopg2, asyncpg
Node.js: @aws/aurora-dsql-node-postgres-connector v0.1.8 (pg) and @aws/aurora-dsql-postgresjs-connector v0.2.1 (Postgres.js)
Java: JDBC connector (PgJDBC wrapper)
Go: pgx v5.8.0 wrapper (February 2026)
Ruby: aurora-dsql-ruby-pg-connector (February 2026)

ORM and Migration Tooling:

Tortoise ORM adapter (Python async ORM)
Prisma CLI tools (Node.js ORM integration)
Flyway dialect (database migration tooling)

IDE Integrations:

DBeaver plugin (Community and Pro editions)
VS Code SQLTools driver
JupyterLab and SageMaker AI integration
AWS Console Query Editor

AI Steering:

Aurora DSQL MCP Server for AI-assisted development
Kiro Powers for Kiro IDE
Skills for Claude Code, Cursor, Gemini, Codex
Steering ensures AI assistants generate DSQL-compatible code (handling OCC retries, DDL limits, IAM auth)

Infrastructure:

Terraform 1.11+ with AWS provider 6.18+ - native aws_dsql_cluster resource
terraform-aws-modules/rds-aurora DSQL submodule for multi-Region
CloudFormation support
AWS Backup integration for automated backups

Best Practices

Implement OCC retry logic on every write path. Use exponential backoff with 3-5 retries. Read-only transactions don't need retries.
Keep transactions small and fast. The 3,000 row and 5-minute limits exist for good reason. Batch large operations into chunks of 500 rows.
Use UUID primary keys. Random UUIDs distribute writes evenly across storage shards. Sequential IDs create hot spots that increase OCC conflicts.
Refresh IAM tokens proactively. Tokens default to 15 minutes (configurable up to one week via token-duration-secs). With the default, refresh at 10 minutes to avoid connection failures. The official connectors handle this automatically.
Use the official connectors for production SSL. Raw psycopg2 with sslmode="require" encrypts the connection but doesn't verify the server's identity. The official aurora-dsql-python-connector and @aws/aurora-dsql-node-postgres-connector handle full certificate verification automatically. For production, use the connectors rather than managing SSL configuration yourself.
One DDL per transaction. Always commit after each CREATE TABLE, ALTER TABLE, or CREATE INDEX. This catches many migration scripts that batch DDL.
Scope IAM policies to cluster ARNs. Never use wildcard resources for DSQL permissions. Scope dsql:DbConnect and dsql:DbConnectAdmin to specific cluster ARNs.
Use EXPLAIN ANALYZE VERBOSE for query optimization. Covering indexes can significantly reduce DPU costs by enabling index-only scans instead of full table scans.
Implement referential integrity in application code. Without foreign keys, enforce relationships through application-level validation and carefully designed transaction boundaries.
Test with AWS FIS. Use Fault Injection Service to simulate region failures and validate your application's multi-Region behavior before you need it.
Monitor DPU breakdown in CloudWatch. Watch ComputeDPU, ReadDPU, WriteDPU separately. High WriteDPU relative to reads may indicate OCC conflict storms.

What's Not There Yet - And Why

This is the most contentious part of DSQL. If you're coming from standard RDS PostgreSQL or Aurora PostgreSQL, the list of missing features is significant. But these aren't oversights - the DSQL team made deliberate engineering tradeoffs to deliver strong consistency and predictable performance across a distributed, multi-Region architecture. Some of these features are fundamentally difficult in a disaggregated, OCC-based system. Others have been deliberately deprioritized based on customer usage patterns.

The Full Gap List vs Standard PostgreSQL

PostgreSQL Feature	DSQL Status	Why
Foreign key constraints	Not yet - deprioritized based on customer usage patterns	Cascading operations (e.g., deleting an order with 1,000 line items) create large implicit transactions that conflict with DSQL's 3,000-row transaction limit and OCC model. Many high-scale customers avoid foreign keys even in standard PostgreSQL for this reason. Marc Brooker has noted the team "haven't built foreign key constraints yet" because many customers take the same approach.
Stored procedures (PL/pgSQL)	Not supported	Procedural code running inside the database conflicts with the serverless, stateless Query Processor model. The DSQL team sees this as an architectural direction, not a gap - business logic belongs in CI/CD-deployed application code, not inside the database.
Triggers	Not supported	Same reasoning as stored procedures. Database-side event processing creates hidden coupling and unpredictable transaction sizes. Use EventBridge, Lambda, or application-level event patterns instead.
TRUNCATE	Not supported	Use `DELETE FROM table_name` or `DROP TABLE` + `CREATE TABLE`. TRUNCATE's behavior is difficult to implement consistently across distributed storage replicas.
Temporary tables	Not supported	The stateless, multi-tenant Query Processor model means there's no persistent session state. Use CTEs (`WITH` clauses), subqueries, or regular tables with cleanup logic.
VACUUM / ANALYZE	Not needed	DSQL's MVCC garbage collection is automatic. The 5-minute transaction time limit enables simple, efficient cleanup without the complexity of PostgreSQL's vacuum process. No maintenance windows required.
pgvector / vector support	Not yet	Vector similarity search is planned. In the meantime, AWS offers S3 Vectors and Aurora PostgreSQL with pgvector for embedding workloads.
JSONB columns	Not as a column type	Store JSON in `TEXT` columns and cast to `jsonb` at query time (e.g., `my_column::jsonb->>'key'`). JSON functions and operators work at runtime, but you lose JSONB indexing (GIN indexes).
Full-text search	Not supported	No `tsvector`/`tsquery`. Use OpenSearch Serverless or Amazon Kendra for full-text search workloads.
Multiple databases per cluster	1 database (`postgres`)	Use schemas for logical separation within a cluster, or create separate clusters. This simplifies distributed metadata management.
Tablespaces	Not supported	Storage is fully managed and auto-scaled. No manual storage allocation or placement decisions needed.
Advisory locks	Not supported	OCC replaces all locking mechanisms. Advisory locks are a pessimistic concurrency pattern that doesn't fit the OCC model.
LISTEN / NOTIFY	Not supported	The stateless Query Processor model has no persistent connections for push notifications. Use SQS, SNS, or EventBridge for pub/sub patterns.
Extensions (PostGIS, etc.)	Not supported	The managed, multi-tenant architecture doesn't support arbitrary extensions. Use purpose-built AWS services (Location Service for geo, OpenSearch for search).
Custom collations	`C` collation only	Consistent collation across distributed storage simplifies sort ordering and index behavior across regions. UTF-8 encoding is supported.
Configurable isolation levels	`REPEATABLE READ` only	A single isolation level eliminates an entire class of consistency bugs. Strong snapshot isolation is the sweet spot between anomaly prevention and distributed performance.
Password authentication	IAM only	No database passwords, ever. This is a security decision - IAM tokens integrate with CloudTrail, roles, and temporary credentials.
CREATE INDEX (synchronous)	`CREATE INDEX ASYNC` only	Asynchronous index creation prevents DDL from blocking running transactions. You monitor build progress through system catalog queries. This is actually an improvement for production workloads.
Multiple DDL per transaction	1 DDL per transaction	Distributed schema changes are coordinated across all Query Processors and storage replicas. Limiting to one DDL per transaction keeps this coordination simple and predictable.

The Engineering Reasoning

Marc Brooker addressed the feature gaps directly in his Simplifying Architectures post. The key insight: DSQL's limits aren't arbitrary restrictions - they're what make the system's guarantees possible.

Transaction limits (3,000 rows, 10 MiB, 5 minutes) prevent head-of-line blocking. In a traditional database, one long-running transaction holding locks can stall every other transaction behind it. DSQL's OCC model doesn't have locks, but oversized commits would still create contention at the Adjudicator and Journal layers. The limits keep individual transactions fast and predictable, which keeps the entire system fast and predictable.

No stored procedures or triggers is the most opinionated choice. The DSQL team observed that customers are increasingly moving business logic out of the database and into application code deployed through CI/CD pipelines. Code in the database is hard to version, hard to test, and hard to debug. DSQL leans into this direction rather than supporting both models.

No foreign keys yet is the gap most customers notice first. The team has acknowledged the gap and may add support where it makes sense for the distributed architecture, but has deprioritized it based on customer feedback. The challenge is that cascading operations (CASCADE DELETE, CASCADE UPDATE) can create implicit transactions that exceed the row limits and generate unpredictable OCC conflict windows. Many high-scale PostgreSQL users already avoid foreign keys for exactly these reasons - but having the option matters.

What to Use Instead

For applications that depend heavily on the missing features today, here's the practical guidance:

Need foreign keys, stored procedures, triggers? Use Aurora PostgreSQL Serverless v2. Full PostgreSQL feature set with serverless scaling (though not to zero).
Need vector search? Aurora PostgreSQL with pgvector, S3 Vectors, or OpenSearch Serverless.
Need full-text search? OpenSearch Serverless or Amazon Kendra.
Need pub/sub notifications? EventBridge + Lambda instead of LISTEN/NOTIFY.
Need geospatial queries? Amazon Location Service instead of PostGIS.

DSQL is best for new applications that can work within these constraints, or existing applications that were already avoiding the missing features. The team is actively expanding compatibility - views, sequences, identity columns, and the Go connector all shipped based on direct customer feedback. Foreign key constraints remain a known gap, and customer demand will likely influence when they're addressed.

Things to Know

Connection caching - DSQL manages prepared statements cluster-wide. You may see more prepared statements per connection than expected. This is by design.

IPv4 connections - Some PostgreSQL clients attempt IPv6 first in dualstack mode. If you're on IPv4-only hosts, configure your client for IPv4 explicitly to avoid NetworkUnreachable errors.

Schema propagation - GRANT and REVOKE changes propagate to existing connections within the connection lifetime (up to one hour). For immediate effect, reconnect after permission changes.

Catalog cache - After creating schemas or tables, refresh your connection (disconnect/reconnect or SET search_path again) to update the catalog cache. This catches "Schema Already Exists" errors.

Deletion protection - Enable deletion_protection_enabled = true in production Terraform configs. If you need to destroy a DSQL cluster, disable protection first then run terraform apply before terraform destroy.

Row counts - For large tables, use the system catalog instead of COUNT(*) for row counts. DSQL stores approximate counts in pg_class.reltuples.

TRUNCATE - Not supported. Use DELETE FROM table_name to clear all rows, or DROP TABLE followed by CREATE TABLE for a full reset. This is a common migration stumbling block for scripts that use TRUNCATE for test data cleanup.

Connection pooling - With 60-minute connection limits and IAM token refresh, pool refresh behavior matters. Configure your connection pool to close and recreate connections before the 60-minute limit. The official connectors handle token refresh, but pool-level eviction still needs configuration. Set idleTimeoutMillis (Node.js) or equivalent to well under 60 minutes.

PostgreSQL client version - AWS recommends PostgreSQL client version 17 or later for best compatibility with DSQL.

Recent Features Worth Highlighting

DSQL Playground (February 2026) - A browser-based sandbox where you can create schemas, load sample data, and run SQL queries against a real DSQL database - no AWS account required. This is the fastest way to try DSQL. Visit the Aurora DSQL Playground and start writing queries in seconds.

Sequences and Identity Columns (February 2026) - The most requested feature after foreign keys. You can now use GENERATED ALWAYS AS IDENTITY columns and explicit CREATE SEQUENCE / nextval() calls. Up to 5,000 sequences per database.

AI Steering (February 2026) - The DSQL MCP server and IDE skills ensure AI coding assistants generate code that handles DSQL's specific patterns - OCC retries, DDL limits, IAM auth. If you use Claude Code, Cursor, or similar tools, install the DSQL steering skill. It saves real debugging time.

PrivateLink with Direct Connect (December 2025) - Connect to DSQL from on-premises networks without traversing the public internet. Uses the amzn-cluster-id connection option for clusters behind PrivateLink without private DNS.

Resource-Based Policies (October 2025) - Attach policies directly to DSQL clusters for cross-account access patterns. Useful for shared database architectures.

AWS FIS Integration (August 2025) - Inject connection errors into specific regions to test your application's failover behavior. For multi-Region deployments, run experiments in one region while the other continues normal operations.

My Project: The Kabob Store on DSQL

I built the Kabob Store as a real-world test of DSQL. It's a full e-commerce platform with menu browsing, cart management, and order processing, running on ECS Fargate with a FastAPI backend.

Key architectural decisions from that project:

Direct psycopg2 instead of an ORM - Better control over transaction boundaries and DSQL-specific patterns
Container-based architecture - The same Docker image deploys to Fargate, Lambda, EC2, or EKS without code changes
Multi-Region DSQL with single-Region compute - Data replication for disaster recovery, with plans to add multi-Region compute with Route53 routing
Defense-in-depth security - Six layers from client validation through parameterized queries
IAM token refresh manager - Thread-safe connection management with 55-minute token refresh

The architecture principles from that project apply to any DSQL application. I covered ECS as my default container runtime and EventBridge for event-driven patterns in previous posts - DSQL fits naturally into both patterns.

Cleanup

If you deployed a DSQL cluster to follow along, destroy your resources to avoid ongoing charges:

cd terraform

# If you enabled deletion protection, disable it first:
# Edit dsql-single-region.tf: set deletion_protection_enabled = false
# terraform apply

terraform destroy

For CI/CD pipelines and automated testing, set deletion_protection_enabled = false from the start, or use the force_destroy option in the Terraform module to skip the protection check during teardown.

DSQL charges only for DPUs consumed and storage used - there are no idle compute charges. But storage charges ($0.33/GB-month) continue as long as data exists in the cluster. For multi-Region clusters, destroy both the primary and secondary clusters. The witness region has no standalone resources to clean up.

Wrapping Up

Aurora DSQL is 15 months old and has matured quickly. It went from a 3-region preview to a 14-region GA service with CloudWatch monitoring, AWS Backup, PrivateLink, FIS chaos testing, resource-based policies, sequences, and a growing ecosystem of connectors and IDE integrations.

The gaps are real - no foreign keys, no stored procedures, no vector support. These matter for some workloads. But for new applications that need SQL with serverless economics, multi-Region strong consistency without managing replicas, or a database that actually scales to zero, DSQL delivers.

My decision tree for new projects now has a clear path:

Need key-value at scale? DynamoDB.
Need full PostgreSQL? Aurora PostgreSQL Serverless v2.
Need SQL + serverless + multi-Region? Aurora DSQL.

The code examples in this post are in the GitHub repo - Terraform for infrastructure, Python and Node.js for application patterns. If you want to try DSQL without even creating an AWS account, the DSQL Playground lets you run queries in your browser in seconds. When you're ready for your own cluster, it's sixty seconds from terraform apply to a running PostgreSQL-compatible database with no instances to manage.

If you've been waiting for a serverless SQL database on AWS that isn't a compromise, this is it.

Resources

Aurora DSQL Playground - Try DSQL in your browser, no AWS account needed
Aurora DSQL User Guide
Aurora DSQL Pricing
Aurora DSQL Document History - Track every feature addition
Marc Brooker's DSQL Blog Series - Essential reading. Marc is the VP/Distinguished Engineer behind DSQL. His five-part series covers the architecture internals (reads, writes, transactions, multi-Region, simplifying architectures) in detail you won't find anywhere else.
Aurora DSQL Discord - Community Discord for questions, feedback, and discussion with the DSQL team
terraform-aws-modules/rds-aurora DSQL Module
Aurora DSQL MCP Server - AI steering for DSQL-aware code generation
Aurora DSQL Connectors - Official Python, Node.js, Java, Go, Ruby connectors
My Kabob Store Project - My previous DSQL blog - building a multi-Region e-commerce platform
ECS: My Default Choice for Containers
EventBridge: The Event-Driven Backbone of AWS

Connect with me on X, Bluesky, LinkedIn, GitHub, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

Powertools for AWS Lambda - Best Practices By Default

Darryl Ruggles — Thu, 09 Apr 2026 17:52:31 +0000

AWS Lambda Powertools for Python is the library I add to every Lambda function before writing a single line of business logic. While Powertools is also available for TypeScript, Java, and .NET, this post focuses on the Python edition. Three decorators give you structured logging, distributed tracing, and CloudWatch metrics. The rest of the toolkit - event routing, idempotency, batch processing, parameter management - eliminates the boilerplate that otherwise dominates serverless code. As of early 2026, Powertools v3.25.0 supports Python 3.10 through 3.14, and with the X-Ray SDK entering maintenance mode on February 25, 2026, Powertools Tracer is one of the recommended migration paths for Lambda tracing - alongside AWS Distro for OpenTelemetry (ADOT) and native OpenTelemetry instrumentation.

I wanted to consolidate everything I've learned from using Powertools across real projects into one place. This blog post is a deep dive into what Powertools offers, how I use it in production, and how you can adopt it today.

Why Observability Matters for Serverless

Serverless architectures introduce unique observability challenges. There aren't any servers to SSH into, no log files to tail, and no process monitors to check. When something goes wrong at 2 AM, your only window into the problem is what your functions chose to emit.

Four problems hit every Lambda team eventually:

Unstructured logs - print() statements scattered through code produce text that's nearly impossible to search, filter, or alert on at scale.
Missing correlation - When a single user request touches five Lambda functions, connecting the dots across CloudWatch log groups requires manual effort or custom correlation IDs.
Manual metrics - Calling put_metric_data directly means writing boto3 code, handling batching, and dealing with CloudWatch API limits.
Tracing gaps - Setting up X-Ray manually with the SDK requires segment management, subsegment creation, and careful exception handling in every function.

Multiply these across dozens of functions and you get a codebase where observability code outweighs business logic.

The Problem Without Powertools

This is what observability code typically looks like without Powertools - a pattern I see in many Lambda projects:

import logging
import json
import time
import boto3
from aws_xray_sdk.core import patch_all, xray_recorder

patch_all()
logger = logging.getLogger()
logger.setLevel(logging.INFO)
cloudwatch = boto3.client('cloudwatch')

def lambda_handler(event, context):
    # Manual structured logging
    logger.info(json.dumps({
        "message": "Processing request",
        "request_id": context.aws_request_id,
        "function_name": context.function_name,
        "event": event
    }))

    # Manual X-Ray subsegment
    subsegment = xray_recorder.begin_subsegment('process_data')
    try:
        result = process_data(event)
        subsegment.put_annotation('status', 'success')
    except Exception as e:
        subsegment.put_annotation('status', 'error')
        raise
    finally:
        xray_recorder.end_subsegment()

    # Manual CloudWatch metric
    cloudwatch.put_metric_data(
        Namespace='MyApp',
        MetricData=[{
            'MetricName': 'ProcessedItems',
            'Value': 1,
            'Unit': 'Count'
        }]
    )

    return result

That's 40+ lines of infrastructure code before you write any business logic. It's error-prone - forget to end a subsegment and your traces break. It's inconsistent - each developer formats logs differently. And it makes an API call for every metric, adding latency and cost.

What is AWS Lambda Powertools?

AWS Lambda Powertools is an open-source toolkit maintained by AWS that provides a suite of utilities for Lambda functions. It's available for Python, TypeScript, Java, and .NET - the core patterns (Logger, Tracer, Metrics) are consistent across all languages. It's not a framework - it's a collection of independent utilities you adopt incrementally.

The toolkit includes a growing suite of utilities organized into categories:

Core Observability (The Big Three)

Logger - Structured JSON logging with automatic Lambda context injection
Tracer - AWS X-Ray tracing via decorators with automatic annotation
Metrics - CloudWatch Embedded Metric Format (EMF) for zero-latency custom metrics

Event Handling

Event Handler - REST, HTTP, and GraphQL routing for API Gateway, ALB, and Lambda Function URLs
Batch Processing - SQS, Kinesis, and DynamoDB Streams with partial failure reporting
Kafka Consumer - MSK and self-managed Kafka with automatic JSON, Avro, and Protocol Buffer deserialization
Event Source Data Classes - Typed access to event payloads from 20+ AWS services

Data & Validation

Parser - Pydantic v2 models for event validation and transformation
Idempotency - At-most-once execution with DynamoDB-backed deduplication
Data Masking - PII redaction in logs and traces

Configuration

Parameters - SSM Parameter Store, Secrets Manager, AppConfig, and DynamoDB with caching and transformation
Feature Flags - AppConfig-backed feature toggles with rule evaluation

Utilities

Middleware Factory - Custom middleware with before/after hooks
JMESPath Functions - Built-in JMESPath custom functions for extracting data from complex event structures
Typing - LambdaContext type for IDE autocompletion
Streaming - S3 object streaming for large file processing without loading entire objects into memory

Installation and Setup

Powertools can be added three ways:

Lambda Layer (Recommended for production)

The fastest approach - no bundling required:

arn:aws:lambda:{region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-{python_version}-{arch}:{version}

For example, Python 3.13 on ARM64:

arn:aws:lambda:us-east-1:017000801446:layer:AWSLambdaPowertoolsPythonV3-python313-arm64:29

pip / uv

pip install aws-lambda-powertools
# Or with uv:
uv add aws-lambda-powertools

# With Pydantic for Parser utility:
pip install "aws-lambda-powertools[parser]"
# With all optional dependencies:
pip install "aws-lambda-powertools[all]"

Environment Variables

Three environment variables configure the core utilities across all functions:

POWERTOOLS_SERVICE_NAME=my-service        # Appears in every log entry and metric
POWERTOOLS_METRICS_NAMESPACE=MyApp        # CloudWatch metrics namespace
POWERTOOLS_LOG_LEVEL=INFO                 # Log level (DEBUG, INFO, WARNING, ERROR)

Logger - Structured Logging

The Logger utility replaces Python's standard logging with structured JSON output. Every log entry automatically includes the Lambda request ID, function name, memory allocation, cold start status, and sampling rate.

I use it like this in the Smurf ETL project's process-upload function:

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.typing import LambdaContext

logger = Logger()

@logger.inject_lambda_context(log_event=True)
def lambda_handler(event, context: LambdaContext):
    logger.info("Processing file", extra={
        "file_name": filename,
        "bucket": bucket_name
    })

The @logger.inject_lambda_context decorator automatically adds Lambda context to every log entry. Setting log_event=True logs the entire incoming event on the first line - invaluable for debugging. The extra parameter adds custom fields that become searchable in CloudWatch Logs Insights.

A single log entry produces JSON like:

{
    "level": "INFO",
    "location": "lambda_handler:119",
    "message": "Processing file",
    "timestamp": "2026-01-15T10:30:00.000Z",
    "service": "sales-data-platform",
    "cold_start": true,
    "function_name": "process_upload",
    "function_request_id": "c6af9ac6-7b61-11e6-9a41-93e8deadbeef",
    "file_name": "store_0001_2026-01-15.json",
    "bucket": "sales-data-uploads"
}

Every field is searchable. In CloudWatch Logs Insights:

fields @timestamp, message, file_name
| filter service = "sales-data-platform"
| filter level = "ERROR"
| sort @timestamp desc

Key Logger Features

Correlation IDs - Automatically propagates correlation IDs across Lambda invocations via API Gateway headers, SQS messages, or custom sources
Child loggers - logger.child(extra={"module": "parser"}) creates loggers that inherit the parent context
Sampling - Logger(sampling_rate=0.1) logs DEBUG for 10% of invocations without changing code
Exception logging - logger.exception("Error") automatically includes the full stack trace as structured data

Tracer - Distributed Tracing

The Tracer utility wraps AWS X-Ray with a decorator-based approach that eliminates manual segment management.

From the Account Watcher project:

from aws_lambda_powertools import Tracer

tracer = Tracer()

@tracer.capture_method
def parse_event(event):
    """Parse details of event"""
    # ... business logic ...
    return eventName, result

@tracer.capture_method
def send_slack_message(payload, webhook):
    """Send Slack message"""
    return requests.post(webhook, data=json.dumps(payload))

@tracer.capture_lambda_handler
def lambda_handler(event, context):
    event_name, event_detail = parse_event(event)
    send_slack_message({"text": f"{event_name}: {event_detail}"}, webhook)

@tracer.capture_lambda_handler creates the root X-Ray segment. @tracer.capture_method creates subsegments for each decorated function. Exceptions are automatically captured as fault annotations.

Key Tracer Features

Automatic patching - Patches boto3, requests, and other libraries for downstream tracing
Annotations and metadata - tracer.put_annotation("order_id", "123") for searchable fields, tracer.put_metadata("response", data) for detailed inspection
Async support - Works with asyncio functions via @tracer.capture_method
Provider pattern - Swap X-Ray for OpenTelemetry when the native integration ships

A note on the future of tracing: With the X-Ray SDK now in maintenance mode (February 2026), the Powertools team has an active RFC for native OpenTelemetry support in the Tracer utility. When this ships, Tracer will support any OTLP-compatible backend (Datadog, Honeycomb, Grafana, Jaeger) without changing your application code. If you adopt @tracer.capture_lambda_handler today, your tracing code will transition automatically when the OTel backend lands.

Metrics - CloudWatch Embedded Metric Format

The Metrics utility uses CloudWatch Embedded Metric Format (EMF) to publish custom metrics with zero API calls. Metrics are embedded in log output and extracted asynchronously by CloudWatch - no put_metric_data calls, no latency impact, no API throttling.

From the Smurf ETL process-upload function:

from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit

metrics = Metrics()

@metrics.log_metrics(capture_cold_start_metric=True)
def lambda_handler(event, context):
    # Business logic...
    metrics.add_metric(name="FilesProcessed", unit=MetricUnit.Count, value=1)
    metrics.add_metric(name="RecordsProcessed", unit=MetricUnit.Count, value=len(df))

capture_cold_start_metric=True automatically tracks cold starts per function - one of the most useful metrics for Lambda performance tuning.

From the detect-anomalies function, adding dimensions for more granular metrics:

metrics.add_metric(name="AnomaliesDetected", unit=MetricUnit.Count, value=len(anomalies))
metrics.add_metric(name="BedrockInvocations", unit=MetricUnit.Count, value=1)
metrics.add_dimension(name="ModelId", value=BEDROCK_MODEL_ID)

Dimensions let you filter and group metrics in CloudWatch. Here the ModelId dimension means you can see invocation counts per Bedrock model.

Key Metrics Features

Zero latency - EMF metrics are extracted from logs, not from API calls
Auto-flush - @metrics.log_metrics ensures metrics are flushed even if the function throws
Default dimensions - Set once, applied to all metrics: metrics.set_default_dimensions(environment="prod")
High resolution - Supports 1-second resolution metrics via resolution=MetricResolution.High
Validation - Raises SchemaValidationError if you exceed CloudWatch limits (30 dimensions, 1000 metrics per EMF blob)
Datadog provider - Powertools v3 also supports Datadog as an alternative metrics backend via DatadogProvider, using the same add_metric API

The Big Three Together

The real power of Powertools emerges when Logger, Tracer, and Metrics work together. The pattern I use across all 17 functions in the Smurf ETL project looks like this:

from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.metrics import MetricUnit
from aws_lambda_powertools.utilities.typing import LambdaContext

logger = Logger()
tracer = Tracer()
metrics = Metrics()

@logger.inject_lambda_context(log_event=True)
@tracer.capture_lambda_handler
@metrics.log_metrics(capture_cold_start_metric=True)
def lambda_handler(event, context: LambdaContext):
    # All three utilities active with 3 lines of decorators
    logger.info("Processing started")
    # ... business logic with @tracer.capture_method on helper functions ...
    metrics.add_metric(name="ItemsProcessed", unit=MetricUnit.Count, value=count)

Three decorators. Structured JSON logs with Lambda context. X-Ray traces with automatic subsegments. CloudWatch metrics with cold start tracking. All correlated by the same request ID.

This pattern is used across all 17 Lambda functions in the Smurf ETL project. The consistency means every function produces the same log format, the same trace structure, and the same baseline metrics.

Event Handler - REST Routing

The Event Handler utility provides Flask-like routing for API Gateway, ALB, and Lambda Function URLs. Instead of a monolithic handler with if/elif chains, you define routes with decorators.

from aws_lambda_powertools.event_handler import APIGatewayRestResolver

app = APIGatewayRestResolver()

@app.get("/items")
def list_items():
    return {"items": get_all_items()}

@app.post("/items")
def create_item():
    body = app.current_event.json_body
    item = save_item(body)
    return {"item": item}, 201

@app.get("/items/<item_id>")
def get_item(item_id: str):
    item = find_item(item_id)
    if not item:
        raise NotFoundError(f"Item {item_id} not found")
    return {"item": item}

def lambda_handler(event, context):
    return app.resolve(event, context)

Resolvers are available for REST API (APIGatewayRestResolver), HTTP API (APIGatewayHttpResolver), ALB (ALBResolver), Lambda Function URL (LambdaFunctionUrlResolver), and VPC Lattice (VPCLatticeV2Resolver).

Key Event Handler Features

Path parameters - <item_id> with type hints for automatic validation
CORS - Built-in CORS configuration with CORSConfig
Middleware - Per-route middleware for auth, validation, and logging
Exception handling - Custom exception handlers for consistent error responses
OpenAPI - Automatic OpenAPI schema generation from route definitions
Compression - Automatic gzip compression for responses

Parameters - SSM, Secrets Manager, and More

The Parameters utility provides a unified interface for retrieving configuration from SSM Parameter Store, Secrets Manager, AppConfig, and DynamoDB - with built-in caching and transformation.

from aws_lambda_powertools.utilities import parameters

# SSM Parameter Store (cached for 5 minutes by default in v3)
api_url = parameters.get_parameter("/myapp/api-url")

# Secrets Manager (auto-deserializes JSON)
db_creds = parameters.get_secret("myapp/db-credentials", transform="json")

# AppConfig with custom cache TTL
feature_config = parameters.get_app_config(
    name="features",
    environment="prod",
    application="myapp",
    max_age=300  # cache for 5 minutes
)

Caching is critical for Lambda performance. You could roll your own caching with a module-level variable, but you'd need to manage TTL expiry, handle invalidation, and replicate that logic across every function. Powertools handles all of this automatically - values are cached in memory across warm invocations with a configurable TTL (5 minutes by default in v3), reducing latency and cost with no boilerplate.

Idempotency

The Idempotency utility ensures Lambda functions process each unique request exactly once. It uses DynamoDB to track processed requests by hashing the payload and storing results for a configurable TTL.

from aws_lambda_powertools.utilities.idempotency import (
    DynamoDBPersistenceLayer,
    IdempotencyConfig,
    idempotent,
)

persistence_layer = DynamoDBPersistenceLayer(table_name="IdempotencyTable")
config = IdempotencyConfig(expires_after_seconds=3600)

@idempotent(config=config, persistence_store=persistence_layer)
def lambda_handler(event, context):
    payment = process_payment(event["payment_id"], event["amount"])
    return {"payment_id": payment.id, "status": "completed"}

The flow:

Request arrives, Powertools hashes the relevant payload fields
Checks DynamoDB for an existing record with that hash
If found and not expired, returns the cached result (no re-execution)
If not found, executes the function and stores the result
Concurrent requests with the same hash are handled with DynamoDB conditional writes

Key Idempotency Features

Selective hashing - event_key_jmespath="body.payment_id" to hash only specific fields
Method-level idempotency - @idempotent_function for individual methods, not just handlers
Custom serialization - Pluggable serializers for complex result types
Expiry - Configurable TTL for cached results
Jitter - Built-in jitter to prevent cache stampedes

This is essential for payment processing, order creation, or any operation where double-execution causes real harm.

The idempotency table is a required infrastructure component. In Terraform:

resource "aws_dynamodb_table" "idempotency" {
  name         = "IdempotencyTable"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "id"

  attribute {
    name = "id"
    type = "S"
  }

  ttl {
    attribute_name = "expiration"
    enabled        = true
  }
}

PAY_PER_REQUEST avoids capacity planning headaches for bursty workloads, and the TTL attribute lets DynamoDB automatically clean up expired idempotency records.

Batch Processing

The Batch Processing utility handles partial failures for SQS, Kinesis, and DynamoDB Streams. When processing a batch of 10 SQS messages and 2 fail, Powertools reports only those 2 as failed - the other 8 are deleted from the queue.

from aws_lambda_powertools.utilities.batch import (
    BatchProcessor,
    EventType,
    process_partial_response,
)

processor = BatchProcessor(event_type=EventType.SQS)

def record_handler(record):
    payload = record.body
    # Process individual message
    result = process_message(json.loads(payload))
    return result

def lambda_handler(event, context):
    return process_partial_response(
        event=event,
        record_handler=record_handler,
        processor=processor,
        context=context,
    )

Without this utility, a single failed message causes the entire batch to retry - including messages that already processed successfully. This leads to duplicate processing, wasted compute, and potential data corruption.

Key Batch Processing Features

Partial failures - Only failed records are retried via batchItemFailures response
Error handling - Configurable error handling strategies (raise, suppress, or custom)
Async support - AsyncBatchProcessor for asyncio-based handlers
Pydantic integration - Combine with Parser for typed record processing

Parser and Validation

The Parser utility uses Pydantic v2 for event validation and transformation. Define models for your event payloads and Powertools validates them automatically:

from pydantic import BaseModel, field_validator
from aws_lambda_powertools.utilities.parser import event_parser

class OrderEvent(BaseModel):
    order_id: str
    customer_id: str
    amount: float
    items: list[str]

    @field_validator("amount")
    @classmethod
    def amount_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError("Amount must be positive")
        return v

@event_parser(model=OrderEvent)
def lambda_handler(event: OrderEvent, context):
    # event is already validated and typed
    logger.info(f"Processing order {event.order_id}")

Built-in envelope support handles the common pattern of extracting business data from AWS event wrappers (API Gateway, SQS, EventBridge, etc.).

Feature Flags and Data Masking

Feature Flags

Feature Flags use AWS AppConfig to evaluate rules at runtime without redeploying:

from aws_lambda_powertools.utilities.feature_flags import FeatureFlags, AppConfigStore

app_config = AppConfigStore(
    environment="prod",
    application="myapp",
    name="features"
)
feature_flags = FeatureFlags(store=app_config)

# Simple boolean flag
premium_enabled = feature_flags.evaluate(name="premium_features", default=False)

# Context-based rules (e.g., percentage rollout, user segments)
new_ui = feature_flags.evaluate(
    name="new_checkout_ui",
    context={"user_id": user_id, "tier": "premium"},
    default=False
)

Data Masking

Data Masking redacts sensitive fields in logs and traces:

from aws_lambda_powertools.utilities.data_masking import DataMasking

masker = DataMasking()
masked = masker.mask(data, fields=["$.credit_card", "$.ssn"])
# credit_card: "*****", ssn: "*****"

Beyond simple field redaction, Data Masking also supports encryption via the AWS Encryption SDK. This lets you encrypt sensitive fields in transit and decrypt them only in authorized downstream services - a step beyond masking for compliance-heavy workloads.

Middleware Factory

The Middleware Factory creates reusable middleware with before/after hooks:

from aws_lambda_powertools.middleware_factory import lambda_handler_decorator

@lambda_handler_decorator
def middleware_with_validation(handler, event, context):
    # Before handler
    if "Authorization" not in event.get("headers", {}):
        return {"statusCode": 401, "body": "Unauthorized"}

    # Execute handler
    response = handler(event, context)

    # After handler
    response["headers"] = {
        **response.get("headers", {}),
        "X-Request-Id": context.aws_request_id
    }
    return response

@middleware_with_validation
def lambda_handler(event, context):
    return {"statusCode": 200, "body": "OK"}

This is useful for cross-cutting concerns like authentication, input validation, response enrichment, and error standardization.

Infrastructure as Code: Terraform

This is how I configure Powertools across 17 Lambda functions in the Smurf ETL project using Terraform:

locals {
  powertools_layer_arn = "arn:aws:lambda:${var.aws_region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-python313-arm64:29"

  # Common Powertools environment variables
  powertools_env_vars = {
    POWERTOOLS_SERVICE_NAME      = "sales-data-platform"
    POWERTOOLS_METRICS_NAMESPACE = "SalesDataPlatform"
    POWERTOOLS_LOG_LEVEL         = var.log_level
  }
}

resource "aws_lambda_function" "process_upload" {
  function_name = "process_upload"
  role          = aws_iam_role.lambda_role.arn
  runtime       = "python3.13"
  architectures = ["arm64"]
  handler       = "process_upload.lambda_handler"
  timeout       = 30
  memory_size   = 256

  reserved_concurrent_executions = 10  # prevent runaway scaling

  layers = [local.powertools_layer_arn]

  tracing_config {
    mode = "Active"
  }

  environment {
    variables = merge(local.powertools_env_vars, {
      S3_BUCKET        = aws_s3_bucket.upload_bucket.id
      PROCESSED_PREFIX = var.processed_prefix
    })
  }
}

# Explicit log group with retention - don't pay for infinite retention
resource "aws_cloudwatch_log_group" "process_upload" {
  name              = "/aws/lambda/process_upload"
  retention_in_days = 30
}

IAM for Powertools Utilities

Each Powertools utility has specific IAM requirements. A least-privilege role scoped to what the function actually needs looks like this:

resource "aws_iam_role" "lambda_role" {
  name               = "process-upload-role"
  assume_role_policy = data.aws_iam_policy_document.lambda_assume.json
}

data "aws_iam_policy_document" "lambda_assume" {
  statement {
    actions = ["sts:AssumeRole"]
    principals {
      type        = "Service"
      identifiers = ["lambda.amazonaws.com"]
    }
  }
}

resource "aws_iam_role_policy" "powertools_permissions" {
  name   = "powertools-permissions"
  role   = aws_iam_role.lambda_role.id
  policy = data.aws_iam_policy_document.powertools.json
}

data "aws_caller_identity" "current" {}

data "aws_iam_policy_document" "powertools" {
  # Note: This policy covers Powertools-specific permissions only.
  # Your function will also need permissions for its business logic
  # (S3, DynamoDB, Bedrock, etc.).
  # CloudWatch Logs - required for Logger
  statement {
    actions   = ["logs:CreateLogStream", "logs:PutLogEvents"]
    resources = ["${aws_cloudwatch_log_group.process_upload.arn}:*"]
  }

  # X-Ray - required for Tracer
  statement {
    actions   = ["xray:PutTraceSegments", "xray:PutTelemetryRecords"]
    resources = ["*"]
  }

  # SSM Parameter Store - required for Parameters utility
  statement {
    actions   = ["ssm:GetParameter", "ssm:GetParametersByPath"]
    resources = ["arn:aws:ssm:${var.aws_region}:${data.aws_caller_identity.current.account_id}:parameter/myapp/*"]
  }

  # DynamoDB - required for Idempotency utility
  statement {
    actions   = ["dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:UpdateItem", "dynamodb:DeleteItem"]
    resources = [aws_dynamodb_table.idempotency.arn]
  }
}

Key patterns:

Layer ARN in a local - Change the version once, all functions update
merge() for env vars - Common Powertools vars merged with function-specific vars
tracing_config { mode = "Active" } - Enables X-Ray for the Tracer utility
ARM64 architecture - Graviton gives better price-performance, and Powertools provides ARM64-specific layers
Explicit timeout and memory - Don't rely on defaults (3s/128MB); Powertools Tracer adds some overhead, so size accordingly
Reserved concurrency - Prevents runaway scaling from overwhelming downstream services like DynamoDB or external APIs
Explicit log group with retention - Don't pay for infinite retention; 30 days covers most debugging needs
Least-privilege IAM - Each Powertools utility gets only the permissions it needs, scoped to specific resource ARNs where possible

Infrastructure as Code: SAM

From the Account Watcher project, using SAM with Powertools:

Globals:
  Function:
    Timeout: 3
    MemorySize: 128
    Tracing: Active
    LoggingConfig:
      LogFormat: JSON
    Environment:
      Variables:
        POWERTOOLS_SERVICE_NAME: AccountEventHandler

Resources:
  AccountEventHandler:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: account_event_handler/
      Handler: app.lambda_handler
      Runtime: python3.12
      Architectures:
        - arm64
      Layers:
        - !Sub arn:aws:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2-Arm64:69

SAM's Globals section is powerful here - Tracing: Active and LoggingConfig apply to all functions in the template. The layer ARN uses !Sub for region substitution.

Warning: This project still uses the v2 layer, which reached end of support on March 25, 2025. Don't copy this ARN into new projects. For v3, replace the layer with:

arn:aws:lambda:{region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-python313-arm64:29

The migration is straightforward - update the layer ARN, bump the runtime, and fix any deprecated imports. The core decorator pattern (@logger.inject_lambda_context, @tracer.capture_lambda_handler, @metrics.log_metrics) is identical between v2 and v3.

CDK users: Powertools is equally well-supported with AWS CDK. The aws-lambda-python-alpha construct handles bundling, and you can pass the Powertools layer ARN via LayerVersion.fromLayerVersionArn(). The same environment variables and layer patterns apply - just expressed as CDK constructs instead of HCL or YAML.

Real Architecture: Smurf ETL Pipeline

My Serverless Smurf ETL and Analytics project is a sales data platform with 17 Lambda functions - every one using Powertools.

The pipeline:

Store managers upload daily sales JSON via a presigned URL
EventBridge triggers the process-upload function (JSON validation, Parquet conversion)
Step Functions orchestrates the daily analysis pipeline
Lambda functions calculate store metrics, company metrics, and product metrics
Three Bedrock-powered functions detect anomalies, analyze trends, and generate recommendations
Results are stored in DynamoDB and exported to QuickSight
A React frontend displays analytics via CloudFront

Every Lambda function uses the same three-decorator pattern:

@logger.inject_lambda_context(log_event=True)
@tracer.capture_lambda_handler
@metrics.log_metrics(capture_cold_start_metric=True)
def lambda_handler(event, context: LambdaContext):

The Terraform configuration uses merge(local.powertools_env_vars, {...}) to ensure consistent service names and namespaces across all 17 functions. CloudWatch Logs Insights queries work seamlessly across all functions because the log format is identical.

Real Architecture: Account Watcher

My Serverless Account Watcher monitors AWS account changes and sends notifications via Slack and email.

The architecture:

CloudTrail captures all API activity in the AWS account
EventBridge rules match specific events (S3 bucket deletion, IAM key creation, root login)
A Lambda function processes matched events, formats messages, and sends notifications
Slack webhook for instant alerts, SNS for email distribution

The Lambda function uses all three Powertools decorators plus @tracer.capture_method on helper functions:

@tracer.capture_lambda_handler
@logger.inject_lambda_context(log_event=True)
@metrics.log_metrics(capture_cold_start_metric=True)
def lambda_handler(event, context):
    event_name, event_detail = parse_event(event)
    send_slack_message({"text": f"{event_name}: {event_detail}"}, SLACK_WEBHOOK_URL)
    publish_to_sns(event_name, event_detail, SNS_TOPIC_ARN)

This is a great starter project for anyone learning Powertools - it's a single Lambda function with real-world observability, and you can deploy it with sam deploy in minutes.

Best Practices and Patterns

After using Powertools across multiple production projects, here are the practices I've found most valuable:

Always use all three core decorators. Logger, Tracer, and Metrics together provide comprehensive observability. The overhead is negligible compared to the debugging value.
Set POWERTOOLS_SERVICE_NAME consistently. Use the same service name across related functions. This is the primary filter key in CloudWatch Logs Insights and X-Ray.
Use log_event=True in non-production only. Logging the full event is invaluable for debugging but can expose PII, payment data, or health information in production - potentially violating GDPR, PCI-DSS, or HIPAA. Omit log_event in production (it defaults to False), and use an environment variable to enable it per stage. For events that do contain sensitive fields, pair this with the Data Masking utility to redact or encrypt before logging.
Prefer layers over pip install. Layers reduce deployment package size and ensure all functions use the same Powertools version. Update the layer version in one place (Terraform local or SAM Globals).
Use merge() in Terraform for environment variables. Define common Powertools vars once in a local block, then merge with function-specific vars. This prevents drift.
Add custom dimensions to metrics. Default metrics are useful but dimensions like ModelId, StoreId, or Environment make metrics actionable.
Use @tracer.capture_method on important helper functions. Not every function needs tracing, but external calls (APIs, databases, S3) should always be traced.
Adopt idempotency for payment and write operations. The cost of a DynamoDB lookup is trivial compared to the cost of double-processing a payment.
Use batch processing for SQS consumers. Partial failure reporting prevents the "poison message" problem where one bad record blocks the entire queue.
Keep the layer version pinned. Use a specific version number, not latest. Update deliberately after testing.

Powertools Timeline

Powertools for Python has had an active release cadence since the v3.0 GA launch. Here are the key milestones in reverse chronological order:

March 2026 - v3.25.0 released. An RFC is also in progress for native OpenTelemetry support in the Tracer utility. When shipped, Tracer will support any OTLP-compatible backend (Datadog, Honeycomb, Grafana) without changing application code.
February 2026 - The AWS X-Ray SDK entered maintenance mode on February 25, 2026. AWS recommends migrating to ADOT (AWS Distro for OpenTelemetry) or native OpenTelemetry instrumentation. For Lambda functions, Powertools Tracer remains a strong option that abstracts the underlying SDK - and with the OTel RFC in progress, Powertools itself will eventually move to an OTel backend.
January 2026 - v3.24.0 released. Python 3.9 was dropped. Supported versions: 3.10, 3.11, 3.12, 3.13, 3.14.
November 2025 - v3.23.0 added Python 3.14 Lambda runtime support.
October 2025 - v3.21-3.22 brought Pydantic model support for query string and header parameter validation in Event Handler, plus enhanced OpenAPI response capabilities (headers, links, examples, encoding).
June 2025 - v3.14-3.15 introduced the BedrockAgentFunctionResolver for building Bedrock Agent integrations with Lambda, and the Kafka Consumer utility with automatic JSON, Avro, and Protocol Buffer deserialization.
April 2025 - v3.11 launched the AppSync Events Resolver for handling real-time APIs, plus Data Masking support for Pydantic models, dataclasses, and standard Python classes.
March 2025 - v2 reached end of support on March 25, 2025. Migration to v3 is straightforward - update the layer ARN and fix deprecated imports. The core decorator patterns are identical.

Common Pitfalls and Tips

Forgetting @metrics.log_metrics - Without this decorator, metrics are never flushed. Your add_metric calls silently do nothing. Always decorate the handler.
Oversized log events - log_event=True with large payloads (S3 batch notifications, SQS batch of 10,000 records) can exceed CloudWatch log size limits. Use log_event=True selectively.
Tracer in non-Lambda contexts - The Tracer utility assumes X-Ray is available. In unit tests, set POWERTOOLS_TRACE_DISABLED=true to avoid errors.
Idempotency table capacity - For high-throughput functions, use on-demand capacity for the idempotency DynamoDB table. Provisioned capacity can throttle under burst.
Layer version mismatch - Mixing Powertools layer versions across functions in the same service causes inconsistent log formats and metric namespaces. Pin to one version.
Cold start metrics confusion - The ColdStart metric counts cold starts, not warm starts. A spike means your functions are scaling up or being recycled.
Missing tracing_config { mode = "Active" } - The Tracer decorator works without active tracing, but traces will not appear in X-Ray. Always enable active tracing in your IaC.

Resources

Powertools for AWS Lambda (Python) Documentation
Powertools GitHub Repository
Smurf ETL and Analytics - 17 Lambda functions with Powertools
Serverless Account Watcher - EventBridge + Powertools monitoring
Smurf ETL GitHub Repo - Full Terraform and Python code
Account Watcher GitHub Repo - SAM template and Lambda code
AWS X-Ray SDK Maintenance Mode Announcement
CloudWatch Embedded Metric Format Specification
AWS London Ontario User Group - Community meetups and talks
AWS London Ontario User Group YouTube - Recorded presentations

Connect with me on X, Bluesky, LinkedIn, GitHub, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

EKS and The Cross-AZ Tax: How to Stop Paying AWS $0.02/GB for Traffic That Should Never Leave Your Availability Zone

Darryl Ruggles — Fri, 03 Apr 2026 14:02:49 +0000

I run EKS clusters across three Availability Zones because that's what production resilience typically demands. You probably do too. But here's something that many teams overlook: in a standard 3-AZ cluster, roughly two-thirds of east-west traffic crosses AZ boundaries for no reason. AWS charges $0.01/GB in each direction for that, $0.02/GB round trip, and it adds up fast.

This isn't a theoretical problem. A cluster pushing 10 TB/month of inter-service traffic with 50-67% crossing AZ boundaries racks up $50-67/month in cross-AZ charges ($0.01/GB per direction). Scale that to 50 TB/month and you're looking at $250-335/month. For a cost that's entirely avoidable with a one-line YAML change on your Services.

I built a test environment to measure the actual impact of the major cross-AZ optimization strategies that AWS and Kubernetes provide. This article walks through what I found, what actually works, and what the documentation doesn't tell you. All the code is Terraform and Kubernetes 1.35, using EKS Auto Mode.

If you have not read my earlier article, A Complete Terraform Setup for EKS Auto Mode, that one covers the compute side of EKS cost optimization (Auto Mode, Graviton NodePools, Spot fallback chains, and the Terraform module structure I use). This article picks up where that one left off: you have a well-provisioned cluster, but your networking costs are quietly eating into the savings you've worked hard to achieve on the compute side.

Where the Money Goes

Before optimizing anything, I wanted to understand exactly where cross-AZ traffic happens. There are three places, and the first one is by far the biggest.

East-West Traffic: The Silent Majority

When Pod A calls Pod B through a Service, kube-proxy picks a backend pod using iptables rules. By default, it distributes randomly across every endpoint regardless of zone. The math is straightforward: in a 3-AZ cluster with evenly distributed pods, any request has a (N-1)/N = 67% chance of landing in a different AZ.

That 67% is the number that matters. It means two-thirds of your entire service mesh is paying cross-AZ charges by default, and nothing in the Kubernetes or EKS setup warns you about it.

North-South Traffic: The Extra Hop

The AWS Load Balancer Controller supports "instance" mode and "ip" mode for targeting. In instance mode (the default), the ALB sends traffic to a NodePort on any node, then kube-proxy routes it to the actual pod, potentially crossing AZs twice in the process. In ip mode, the ALB targets pod IPs directly. Switching to IP target mode eliminates the problem entirely.

NAT Gateway Traffic: Death by a Thousand Pulls

Pods in private subnets reach the internet through NAT Gateways. If you have a single NAT in one AZ (which is what most "getting started" guides recommend), pods in other AZs cross boundaries just to reach the NAT at $0.045/GB processing plus the cross-AZ charge. Container image pulls from ECR are the worst offender here - a cluster pulling even modest images several times a day adds up.

Here is what a typical unoptimized cluster looks like. The red lines are cross-AZ traffic you are paying for:

The Test Setup

I built this with Terraform using the same modular approach from my EKS Auto Mode article: community modules for VPC and EKS, the alekc/kubectl provider for Kubernetes resources. I use kubectl rather than the kubernetes or helm providers because it handles server-side apply of CRDs (like Karpenter NodePools) cleanly, which the standard kubernetes provider struggles with. If you followed that article, the structure here will look familiar. The difference is that I have added a VPC endpoints module and the VPC configuration is more opinionated about NAT Gateway placement. Three modules: VPC, EKS, and VPC endpoints.

The VPC: One NAT Per AZ

The first infrastructure decision that matters is NAT Gateway placement. Here is what I used:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "${var.cluster_name}-vpc"
  cidr = var.vpc_cidr
  azs  = var.azs

  private_subnets = [
    cidrsubnet(var.vpc_cidr, 4, 0),
    cidrsubnet(var.vpc_cidr, 4, 1),
    cidrsubnet(var.vpc_cidr, 4, 2),
  ]

  public_subnets = [
    cidrsubnet(var.vpc_cidr, 4, 3),
    cidrsubnet(var.vpc_cidr, 4, 4),
    cidrsubnet(var.vpc_cidr, 4, 5),
  ]

  # This is the setting that matters
  enable_nat_gateway     = true
  single_nat_gateway     = false
  one_nat_gateway_per_az = true

  enable_dns_hostnames = true
  enable_dns_support   = true

  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }

  tags = var.tags
}

Each additional NAT Gateway costs about $32/month. You save $0.01/GB in each direction ($0.02/GB round trip) on traffic that no longer crosses AZ boundaries to reach the NAT. That means the extra NAT pays for itself at just 1.6 TB/month of internet-bound traffic per AZ. The NAT processing charge ($0.045/GB) applies regardless of which AZ the NAT is in, so that cost doesn't change with this optimization. For most production clusters, 1.6 TB/month of internet traffic per AZ is a low bar to clear.

I see many Terraform examples online that set single_nat_gateway = true to save money in dev environments. That's fine for dev, but if you copy that pattern into production you are silently adding cross-AZ charges to every internet-bound packet from two of your three AZs.

VPC Endpoints: The Optimization Most Teams Skip

This is the optimization that surprises most teams. Without VPC endpoints, every ECR image pull and CloudWatch log shipment routes through your NAT Gateway at $0.045/GB. I always add endpoints for the services that generate the most traffic:

# S3 Gateway - FREE. No hourly cost, no data cost. No reason not to have this.
resource "aws_vpc_endpoint" "s3" {
  vpc_id            = var.vpc_id
  service_name      = "com.amazonaws.${data.aws_region.current.name}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = var.route_table_ids
}

# ECR API - for docker login and manifest operations
resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.${data.aws_region.current.name}.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true
}

# ECR Docker - for image layer pulls (this is the high-volume one)
resource "aws_vpc_endpoint" "ecr_dkr" {
  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.${data.aws_region.current.name}.ecr.dkr"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true
}

# STS - for IRSA and Pod Identity. Low volume but breaks auth if missing.
resource "aws_vpc_endpoint" "sts" {
  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.${data.aws_region.current.name}.sts"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true
}

The interface endpoints cost about $7.20/month per AZ ($21.60 total across 3 AZs, per endpoint). The S3 and DynamoDB gateway endpoints are free. At $0.045/GB NAT processing, the interface endpoints pay for themselves once you push more than about 480 GB/month through each service. ECR image pulls alone blow past that in most clusters.

A nuance worth understanding: ECR uses two separate service endpoints. The ecr.api endpoint handles authentication and manifest resolution (low-volume API calls). The ecr.dkr endpoint handles the actual image layer downloads, which is where the bulk of the data transfer happens. If you still have a NAT Gateway (and you likely do for other internet-bound traffic), you don't strictly need both. The ecr.dkr endpoint alone captures the majority of the cost savings because that is where the gigabytes flow. The ecr.api calls are small enough that routing them through NAT is negligible. I include both in the repo for completeness, but if you're watching your interface endpoint spend, ecr.dkr plus the free S3 gateway endpoint are the ones that move the needle. ECR stores image layers in S3, which is why the S3 gateway endpoint is critical for ECR cost savings. The STS endpoint is shown in the code above. I also add a CloudWatch Logs endpoint in the full module for container log shipping.

Another gotcha: the security group on your interface endpoints must allow inbound HTTPS (443) from your VPC CIDR. Skip that and you get timeouts that look like DNS failures. Here is what the security group looks like:

resource "aws_security_group" "vpc_endpoints" {
  name_prefix = "${var.cluster_name}-vpce-"
  description = "Security group for VPC interface endpoints"
  vpc_id      = var.vpc_id

  ingress {
    description = "HTTPS from VPC"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
  }
}

This allows the entire VPC CIDR, which is fine for most setups. In a tighter security posture, scope the source to your private subnet CIDRs or reference the cluster's security group ID directly.

EKS Auto Mode on Kubernetes 1.35

I used Kubernetes 1.35, the latest version available on EKS as of March 2026. Traffic Distribution was introduced as alpha in 1.30 with PreferClose, which graduated to GA in 1.33. In 1.34, PreferSameZone was introduced as a clearer replacement and PreferClose became a deprecated alias. Both PreferSameZone and PreferSameNode graduated to stable in 1.35. More on both below.

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 21.0"

  name               = var.cluster_name
  kubernetes_version = "1.35"

  vpc_id     = var.vpc_id
  subnet_ids = var.private_subnet_ids

  # When node_pools are specified, the module automatically creates
  # the required Auto Mode IAM resources.
  compute_config = {
    enabled    = true
    node_pools = ["general-purpose", "system"]
  }

  # Private access for in-VPC communication, public for kubectl access.
  # In production, restrict public access to specific CIDRs or disable it.
  endpoint_private_access = true
  endpoint_public_access  = true

  # Control plane logging for observability
  enabled_log_types = ["api", "audit", "authenticator"]

  # Cross-AZ traffic visibility in the EKS console
  addons = {
    amazon-cloudwatch-observability = {
      most_recent = true
    }
  }

  # Fine for demos. For production, use scoped access entries with narrower policies.
  enable_cluster_creator_admin_permissions = true

  tags = var.tags
}

I also added a Graviton NodePool at weight 10 for better price-performance, using the same weighted NodePool pattern I described in the Auto Mode article. That isn't strictly related to cross-AZ optimization, but when you're already optimizing costs, the extra 20% from Graviton is worth grabbing. The full NodePool manifest is in terraform/modules/eks/main.tf.

Important Auto Mode detail: both the NodePool and NodeClass API groups are eks.amazonaws.com in Auto Mode, not karpenter.k8s.aws as in self-managed Karpenter installations. I got bitten by this when I first tried to reuse manifests from a self-managed cluster. The resource silently fails to apply with the wrong API group.

Measuring Cross-AZ Traffic: Build a Tool, Don't Guess

The AWS docs describe these routing strategies well enough, but they don't show you how to verify that they are actually working. I built two things for this:

A Zone-Aware Echo Server

Instead of using a generic HTTP echo server, I wrote a small FastAPI app that returns its own AZ, node, and pod name in every response:

# app/zone-echo/server.py
@app.get("/")
async def echo():
    return {
        "zone": _get_zone(),
        "node": os.environ.get("NODE_NAME", "unknown"),
        "pod": os.environ.get("POD_NAME", "unknown"),
        "pod_ip": os.environ.get("POD_IP", "unknown"),
    }

The zone is resolved at startup by querying the Kubernetes API for the node's topology.kubernetes.io/zone label (IMDS is blocked on Auto Mode's Bottlerocket nodes). If you adapt this for non-Auto-Mode clusters, ensure IMDSv2 is enforced (http_tokens = "required") and set the hop limit appropriately. Each response tells you exactly where it came from with zero ambiguity.

A Traffic Measurement Tool

The second tool sends hundreds of requests to each service variant and counts how many stayed in-zone versus crossed AZ boundaries:

# app/traffic-test/measure.py (abbreviated)
def run_test(service: str, num_requests: int, my_zone: str, my_node: str) -> RoutingStats:
    stats = RoutingStats(service=service)

    for i in range(num_requests):
        body, latency = send_request(service)
        stats.total += 1

        if body is None:
            stats.errors += 1
            continue

        stats.latencies_ms.append(latency)
        resp_zone = body.get("zone", "unknown")

        if resp_zone == my_zone:
            stats.same_zone += 1
        else:
            stats.cross_zone += 1

    return stats

Run it as a Kubernetes Job and check the logs:

kubectl apply -f k8s/monitoring/traffic-test-job.yaml
kubectl logs -n cross-az-demo job/traffic-test

The output looks like this:

==============================================================================
CROSS-AZ TRAFFIC MEASUREMENT REPORT
==============================================================================
Source zone: us-east-1a
Source node: i-07812b1c2b6ed6405

Service                        Reqs  Same-Zone  Cross-AZ  Errors  Avg(ms)  P99(ms)
--------------------------------------------------------------------------------
backend-baseline               5000     49.9%     50.1%       0     2.4      4.2
backend-prefer-same-zone       5000    100.0%      0.0%       0     1.8      3.1
backend-prefer-same-node       5000    100.0%      0.0%       0     1.6      2.5
backend-topology-aware         5000    100.0%      0.0%       0     1.5      2.5

--- Zone Distribution per Service ---
  backend-baseline: us-east-1a: 2497, us-east-1b: 827, us-east-1c: 1676
  backend-prefer-same-zone: us-east-1a: 5000
  backend-prefer-same-node: us-east-1a: 5000
  backend-topology-aware: us-east-1a: 5000

--- Cost Estimate (at 10 TB/month east-west traffic) ---
  backend-baseline: $100.12/month (5,006 GB cross-AZ)
  backend-prefer-same-zone: $0.00/month (0 GB cross-AZ)
  backend-prefer-same-node: $0.00/month (0 GB cross-AZ)
  backend-topology-aware: $0.00/month (0 GB cross-AZ)

  Best improvement: backend-prefer-same-zone
  Cross-AZ reduction: 100.0%
  Estimated monthly savings: $100.12
==============================================================================

The baseline came in at 50% cross-AZ, not the theoretical 67%. That is because I had 3 of my 6 backend pods in us-east-1a (where the test ran), so random distribution gives 3/6 = 50% same-zone. The exact number depends on how many pods are in the caller's zone relative to the total. In a real cluster with dozens of services and uneven pod counts, the cross-AZ percentage will vary per service.

The important result: all three optimized strategies hit 100% same-zone across 5,000 requests. Zero cross-AZ leakage. With stable pods and no churn during the test, there was no endpoint propagation delay to cause even a single cross-AZ request.

PreferSameNode edged out PreferSameZone on latency (1.6ms average versus 1.8ms) because it routes to pods on the same node, not just the same zone. The baseline's 2.4ms average includes the cross-AZ network hop penalty.

This is why I say "build a tool, don't guess." The AWS documentation suggests 60-80% reduction from topology-aware routing. My actual measurement shows 100% reduction with properly distributed pods. That gap is significant enough to change how you think about the ROI.

What Happens During Pod Churn?

Stable pods are one thing. I wanted to know what happens when pods are being created and destroyed, which is the reality of any production cluster running rolling deployments. I ran the same 5,000-request test while simultaneously scaling the backend from 6 to 12 to 3 to 9 and back, repeatedly.

=== RESULTS DURING POD CHURN ===

Service                        Reqs  Same-Zone  Cross-AZ  Errors  Avg(ms)  P99(ms)
--------------------------------------------------------------------------------
backend-baseline               5000     54.7%     42.3%     151     2.6      6.3
backend-prefer-same-zone       5000    100.0%      0.0%       0     1.7      3.5
backend-prefer-same-node       5000    100.0%      0.0%       0     1.7      5.4
backend-topology-aware         5000    100.0%      0.0%       0     1.8      5.3

All three optimized strategies held at 100% same-zone during churn. Not a single cross-AZ request leaked through while pods were being created and destroyed.

The baseline told a different story: 151 errors (3%) from requests hitting endpoints that were mid-termination. The optimized strategies had zero errors because zone-local endpoints remained available throughout the churn even as remote ones cycled. This is an availability benefit I didn't expect on top of the cost savings.

P99 latency did increase under churn. PreferSameNode went from 2.5ms to 5.4ms as the endpoint list updated. PreferSameZone stayed more stable at 3.5ms. That's expected and acceptable.

One thing I got wrong in my initial prediction: I expected Topology Aware Routing (TAR) to drop hints during the churn and leak cross-AZ traffic. It did not. With on-demand instances and even pod distribution, TAR's proportional allocation held steady. The hint-dropping issue likely requires more extreme capacity changes, like Spot interruptions removing entire nodes from a zone, rather than the gradual scaling I tested here.

The Routing Strategies: What Actually Works

I tested five approaches. Here is my honest ranking.

1. Traffic Distribution: PreferSameZone (Use This One)

This is the winner. One field on your Service spec:

apiVersion: v1
kind: Service
metadata:
  name: backend-prefer-same-zone
  namespace: cross-az-demo
spec:
  type: ClusterIP
  trafficDistribution: PreferSameZone  # stable in K8s 1.35, replaces deprecated PreferClose
  selector:
    app: backend
  ports:
    - port: 8080
      targetPort: 8080

That's it. One line. kube-proxy routes to same-zone endpoints first. If no local endpoints exist, it falls back to any available endpoint. PreferSameZone was introduced in 1.34 and graduated to stable in 1.35. No feature gates, no annotation. It is a first-class spec field.

I recommend this as the default for every ClusterIP Service in your cluster unless you have a specific reason not to use it. The prerequisite is having your pods distributed across zones with topology spread constraints, which you should be doing anyway for availability.

What about AZ failures? During an AZ outage, pods in the failed zone lose their local endpoints. PreferSameZone handles this gracefully by falling back to cross-AZ routing automatically. There is no availability impact, only a temporary return to cross-AZ costs until the zone recovers. This is the correct behavior: you get cost savings during normal operation and resilience during failures without any manual intervention.

2. Topology Aware Routing (It Works, But PreferSameZone is Better)

The older approach, using an annotation:

metadata:
  annotations:
    service.kubernetes.io/topology-mode: Auto

TAR uses a more complex algorithm: the EndpointSlice controller allocates endpoints proportionally across zones and sets "hints" for each one. The idea is to balance both locality and load distribution.

In my testing, it works. The numbers are close to PreferSameZone. But I found two problems that make me recommend PreferSameZone instead:

Problem 1: Spot instances break it. TAR recalculates proportional allocation based on zone capacity. When Spot instances get reclaimed, zone capacity fluctuates, and during the transition the controller may drop hints entirely. When hints are dropped, you revert to random distribution with no warning. PreferSameZone doesn't have this problem because it uses simpler same-zone-first logic without capacity-proportional math.

Problem 2: The annotation is deprecated. The service.kubernetes.io/topology-mode: Auto annotation won't graduate to GA. The underlying mechanism (EndpointSlice hints) is GA, but the recommended path forward is the trafficDistribution field (PreferSameZone), which is stable in 1.35. There's no reason to use the deprecated annotation when the replacement is available and simpler.

3. PreferSameNode: The Best of Both Worlds (New in 1.35)

Kubernetes 1.35 added a new Traffic Distribution option that solves a real gap:

spec:
  trafficDistribution: PreferSameNode

This routes to endpoints on the same node first, then falls back to same-zone, then cluster-wide. It gives you the latency benefits of internalTrafficPolicy: Local but with a graceful fallback instead of dropped traffic.

This is the option I wish had existed from the start. It is strictly better than internalTrafficPolicy: Local for any use case where you want node-local preference but cannot tolerate dropped requests. Stable since 1.35.

I recommend this for latency-sensitive service pairs that are often (but not always) co-located. For pure cost optimization, PreferSameZone is sufficient. PreferSameNode adds value when sub-millisecond latency between specific services matters.

4. Internal Traffic Policy: Local (The Nuclear Option)

spec:
  internalTrafficPolicy: Local

This restricts traffic to endpoints on the same node, not just the same zone. Zero cross-AZ traffic is possible. Zero cross-node traffic either.

I call this the nuclear option because if no endpoint exists on the calling pod's node, the traffic is dropped. Not rerouted or retried - just dropped. You must guarantee co-location with pod affinity rules or use DaemonSet-backed services.

I proved this during testing. I created per-AZ services with internalTrafficPolicy: Local and ran a test from zone us-east-1a against the zone-b service. The test hung indefinitely. Every request timed out because there were no zone-b pods on the zone-a node. There were no errors and no fallbacks - just a silent failure.

When it does work (when a backend pod exists on the same node), the latency is the best of any strategy: 1.6ms average in my testing, versus 1.8ms for PreferSameZone. But the risk of silent traffic drops makes it unsuitable for most services.

I use this for exactly two situations:

DaemonSet services (every node has a local endpoint by definition)
Tightly coupled pod pairs with strict requiredDuringSchedulingIgnoredDuringExecution affinity

For everything else, PreferSameZone is safer and almost as effective. And now with PreferSameNode in 1.35, there is even less reason to use Local. You get the same node-local preference with a graceful fallback.

5. Per-AZ Deployments (When PreferSameZone Is Not Enough)

There is a subtle scaling problem that PreferSameZone doesn't solve. Say traffic to your backend spikes in us-east-1a. The HPA scales the Deployment from 6 to 12 replicas. Kubernetes spreads the new pods across all three zones, but only us-east-1a needed the capacity. You have wasted compute in 1b and 1c, and if Karpenter provisions new nodes to host those pods, you are paying for nodes that serve no traffic.

The fix is per-AZ Deployments with independent HPAs:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-az-a
spec:
  replicas: 2
  selector:
    matchLabels:
      app: backend-zonal
      zone: az-a
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: topology.kubernetes.io/zone
                    operator: In
                    values: ["us-east-1a"]
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: backend-az-a
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend-az-a
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Multiply by three AZs. You get triple the manifests and triple the HPA objects to monitor. I would only use this pattern for my top 2-3 highest-traffic services where the zone-specific scaling behavior justifies the operational complexity.

The full 3-AZ manifest is in k8s/cross-az-optimization/05-per-az-deployment.yaml.

Load Balancer: Switch to IP Mode

For services fronted by an ALB, switch the target type from "instance" to "ip":

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/scheme: internet-facing

In instance mode, traffic flows: ALB - NodePort (any node) - kube-proxy - Pod (maybe different AZ). That is up to two cross-AZ hops. In IP mode: ALB - Pod IP directly. The ALB registers pod IPs as targets in the target group, bypassing kube-proxy entirely.

Combine this with trafficDistribution: PreferSameZone on the backing Service for the best result.

What the AWS Documentation Doesn't Tell You

I ran into several things during testing that aren't obvious from the official docs:

Topology spread constraints need minDomains with Auto Mode. This one bit me during testing. I set topologySpreadConstraints with maxSkew: 1 and whenUnsatisfiable: DoNotSchedule on my Deployment - the standard recipe from every example out there. But with EKS Auto Mode, you start from zero nodes. Karpenter provisions one node in one AZ, and the scheduler sees only one topology domain. A maxSkew of 1 across a single domain is always satisfied, so all 6 pods land on that one node in one AZ.

The fix is minDomains: 3. This tells the scheduler that at least 3 zones must exist before scheduling is satisfied, which forces Karpenter to provision nodes in other AZs:

topologySpreadConstraints:
  - maxSkew: 1
    minDomains: 3
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: backend

Most topology spread examples assume you already have nodes in multiple AZs. With Auto Mode and Karpenter, you do not. Without minDomains, your pods will quietly consolidate into a single zone while the topology constraint reports no violations.

The EKS console has built-in cross-AZ visibility now. Enable the amazon-cloudwatch-observability addon (included in the Terraform code above) and you get a service map in the EKS console showing cross-AZ flows with data volume. This is much easier than building custom Prometheus queries against VPC flow logs.

VPC endpoint interface charges don't include cross-AZ data transfer. This is buried in a "What's New" announcement from 2022. Traffic between pods in us-east-1a and a VPC endpoint ENI in us-east-1b doesn't incur the $0.01/GB cross-AZ charge. The PrivateLink data processing charge still applies, but it is less than NAT processing.

PreferSameZone occasionally sends cross-AZ during endpoint updates. When pods are added or removed, there's a brief window where EndpointSlice hints haven't propagated. During this window, kube-proxy may route to any endpoint. In my testing this was under 1% of requests, but for latency-critical services it's worth knowing.

CoreDNS is a cross-AZ blind spot. DNS lookups via the kube-dns Service are themselves subject to cross-AZ routing, and as of this writing you can't set trafficDistribution on the CoreDNS addon service directly. There's an open EKS feature request for this. If DNS latency matters to you, this is a gap to watch.

Finding cross-AZ charges in Cost Explorer. Beyond the EKS console's network observability, you can identify cross-AZ charges in AWS Cost Explorer by filtering for the DataTransfer-Regional-Bytes and NatGateway-Bytes usage types. These are the line items that this article's optimizations target.

Production Considerations

This article focuses on FinOps, but if you are taking these patterns into production, a few security points are worth noting:

Cluster access: The demo uses enable_cluster_creator_admin_permissions = true for simplicity. For production, define scoped access entries with narrower policies (e.g., AmazonEKSViewPolicy for read-only users, AmazonEKSClusterAdminPolicy only for platform teams).
Workload identity: EKS Auto Mode with module v21+ uses Pod Identity by default. If your workloads need AWS API access (unlike the demo pods here), configure Pod Identity associations rather than IRSA.
VPC endpoint security groups: The demo allows all VPC CIDR traffic on port 443. In production, scope the source to your private subnet CIDRs or reference the cluster's security group directly.
Network ACLs: The default NACL allows all traffic. Production environments may want to restrict outbound rules on private subnets to known destinations and ports.
IMDSv2: Auto Mode uses Bottlerocket, which enforces IMDSv2. If you adapt this code for self-managed node groups with Amazon Linux, ensure IMDSv2 is enforced (http_tokens = "required" in your launch template) and set http_put_response_hop_limit = 2 for containerized workloads.
Encryption at rest: The demo doesn't configure a custom KMS key for EKS secrets encryption. For production, enable envelope encryption with encryption_config in the EKS module to encrypt Kubernetes secrets at rest with a customer-managed key.
Encryption in transit: Traffic between pods is unencrypted by default. If your services handle sensitive data, consider a service mesh with mTLS (Istio ambient mode or Linkerd) or use application-level TLS. Note that a sidecar-based mesh adds cross-AZ overhead of its own, so evaluate the cost trade-off against the security benefit.

What This Costs and What You Save

Here is my cost analysis for a cluster with 10 TB/month east-west traffic and 2 TB/month AWS service traffic (ECR, S3, CloudWatch):

Category	Before	After	Delta
East-west cross-AZ	$100/month	$0/month	-$100
NAT processing (ECR/S3/logs)	$90/month	$0/month	-$90
NAT cross-AZ (single NAT)	$13/month	$0/month	-$13
VPC interface endpoints (4x3 AZs)	$0	$86/month	+$86
Additional NAT Gateways (2 extra)	$0	$65/month	+$65
Net change	$236/month	$152/month	-$84/month

At 10 TB/month, the savings are modest at about $84/month. But cross-AZ costs scale linearly with traffic. At 50 TB/month east-west, the east-west savings alone hit $650/month. At 100 TB/month, over $1,300/month. The infrastructure costs (VPC endpoints, extra NATs) stay flat.

The real point is that the traffic routing changes (PreferSameZone on Services and IP mode on Ingress) are free. There is zero infrastructure cost and only one line of YAML per Service. Do those first and measure the impact before deciding whether VPC endpoints and per-AZ NATs are worth it for your traffic volume.

My Recommended Order of Operations

Add VPC endpoints for S3 (free), ECR, STS, and CloudWatch Logs. This is the biggest win for most teams. The S3 and DynamoDB gateway endpoints are free, and the ECR interface endpoint eliminates NAT processing charges ($0.045/GB) on every container image pull. If your cluster pulls images regularly, this alone can save more than all the other optimizations combined.
Add trafficDistribution: PreferSameZone to all ClusterIP Services. The highest-impact change for east-west traffic. It's a single field (stable in K8s 1.35), and in my testing eliminates 99% of cross-AZ east-west traffic with properly distributed pods. For latency-sensitive pairs, consider PreferSameNode (stable in 1.35).
Add topology spread constraints to all Deployments. PreferSameZone only works if pods are actually distributed across zones. If they aren't, this is your real first step.
Switch ALB Ingress resources to IP target mode. Another free change. One annotation eliminates the NodePort-to-pod cross-AZ hop.
Deploy one NAT Gateway per AZ. Worth it when your cross-AZ NAT traffic exceeds ~1.6 TB/month per AZ.
Consider per-AZ Deployments for your top 2-3 highest-traffic services. Only if you have zone-specific scaling problems that PreferSameZone doesn't solve.

Getting Started

The complete code is in the companion repository. Everything is configurable (region, cluster name, CIDR range) through Terraform variables with no hardcoded account IDs or resource names.

Prerequisites: AWS CLI configured, Terraform 1.10+, Docker, kubectl, Python 3.13+.

The fastest path is the one-command setup:

git clone https://github.com/darryl-ruggles/eks-cross-az-exploration
cd eks-cross-az-exploration

# Deploy everything and run the measurement (takes ~15 minutes)
make setup

That runs Terraform, builds and pushes container images to ECR, deploys the demo app with all service variants, and executes the traffic measurement job. At the end you get the comparison report.

If you prefer to step through it:

# 1. Deploy infrastructure (VPC, EKS, VPC endpoints, ECR repos)
make infra

# 2. Configure kubectl
make kubeconfig

# 3. Build and push container images
make images

# 4. Deploy the demo app and all routing strategies
make deploy

# 5. Run the cross-AZ measurement
make test

# 6. See pod distribution across AZs
make show-distribution

# 7. Run again with PreferSameNode included
make test-all

# 8. Clean up when done
make teardown

All configuration is overridable: make test NUM_REQUESTS=1000, make infra (uses variables.tf defaults; edit region, cluster_name, vpc_cidr as needed).

IMPORTANT: This deploys real AWS resources that cost money. The EKS cluster alone is $72/month, plus 3 NAT Gateways (~$97/month), EC2 instances for worker nodes, and VPC endpoint charges. Run make teardown when you're done testing. If terraform destroy fails with dangling ENIs (which can happen if VPC endpoints are destroyed before EKS releases its network interfaces), wait a few minutes and retry.

The measurement report gives you actual numbers for your cluster, which matters more than any estimate in a blog post.

Cross-AZ data transfer is one of those costs that seems small on any individual request but compounds into a meaningful line item at scale. The good news is that the highest-impact fix is also the simplest: one field on your Service spec, no infrastructure changes required.

Resources

Connect with me on X, Bluesky, LinkedIn, GitHub, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

AWS Lambda Now Knows Where It Lives: Exploring the New AZ Metadata

Darryl Ruggles — Fri, 27 Mar 2026 01:30:21 +0000

Lambda was the last holdout. EC2 has had its Instance Metadata Service since the beginning. ECS and Fargate have their task metadata endpoints. But Lambda? If your function needed to know which Availability Zone it was running in, you were out of luck - unless you enjoyed building elaborate workarounds with private API Gateway endpoints and DescribeNetworkInterfaces calls.

On March 19, 2026, AWS quietly fixed this with a new metadata endpoint that exposes the AZ ID of the execution environment. No VPC is required, no IAM permissions are needed, and there are no API calls to external services. It's just a localhost HTTP endpoint that returns your AZ in under a millisecond.

I built a small demo app to explore three practical uses for this new metadata, using Powertools for AWS Lambda for observability throughout (I wrote about Powertools best practices previously). The code is on GitHub if you want to deploy it yourself.

What's in the Box

Lambda now sets two new reserved environment variables in every execution environment:

Variable	Description
`AWS_LAMBDA_METADATA_API`	Address of the metadata server (e.g., `169.254.100.1:9001`)
`AWS_LAMBDA_METADATA_TOKEN`	Auth token unique to this execution environment

You make a GET request to the endpoint with the token as a Bearer header, and you get back your AZ ID:

GET http://${AWS_LAMBDA_METADATA_API}/2026-01-15/metadata/execution-environment
Authorization: Bearer ${AWS_LAMBDA_METADATA_TOKEN}

{
  "AvailabilityZoneID": "use1-az1"
}

That's it. The response is cached for 12 hours and marked immutable - it never changes within an execution environment.

Powertools for AWS Lambda makes it even simpler:

from aws_lambda_powertools.utilities.metadata import get_lambda_metadata

metadata = get_lambda_metadata()
az_id = metadata.availability_zone_id  # "use1-az1"

Powertools handles the caching and SnapStart invalidation automatically.

AZ IDs vs AZ Names

One important detail: the metadata returns AZ IDs (use1-az1), not AZ names (us-east-1a). This matters because AZ IDs are consistent across all AWS accounts - use1-az1 is always the same physical data center regardless of which account you're in. AZ names are mapped differently per account to distribute load, so my us-east-1a might be a completely different physical zone than yours.

If you need the AZ name for your account, you can resolve it with a single EC2 API call:

ec2 = boto3.client("ec2")
resp = ec2.describe_availability_zones(
    Filters=[{"Name": "zone-id", "Values": ["use1-az1"]}]
)
az_name = resp["AvailabilityZones"][0]["ZoneName"]  # "us-east-1a"

Cache this - it does not change.

The Demo App

I built a single Lambda function with three API routes using Powertools Event Handler, deployed with SAM. Each route demonstrates a different use case for the new AZ metadata.

Service	Purpose
API Gateway	REST API with X-Ray tracing
Lambda (Python 3.13, ARM64)	Single function, 3 routes via Powertools Event Handler
Powertools Layer	Logger, Tracer, Metrics, Event Handler, Lambda Metadata
CloudWatch Dashboard	Invocations, cold starts, routing, and chaos metrics by AZ

The Foundation: AZ-Enriched Observability

Before getting to the three routes, the most important pattern is in the module initialization and handler. I resolve the AZ ID once during cold start and set it as a default metric dimension:

from aws_lambda_powertools import Logger, Metrics, Tracer
from aws_lambda_powertools.event_handler import APIGatewayRestResolver

logger = Logger()
tracer = Tracer()
metrics = Metrics()
app = APIGatewayRestResolver()

# Resolve AZ ID at module level - metadata endpoint is localhost, sub-millisecond
AZ_ID = _init_az_id()

# Every custom metric gets the AZ dimension automatically
if AZ_ID != "unknown":
    metrics.set_default_dimensions(AvailabilityZoneID=AZ_ID)

Then in the handler, following the Powertools three-decorator pattern, I inject the AZ ID into every log line and X-Ray trace:

@logger.inject_lambda_context(log_event=True)
@tracer.capture_lambda_handler
@metrics.log_metrics(capture_cold_start_metric=True)
def lambda_handler(event: dict, context: LambdaContext) -> dict:
    logger.append_keys(availability_zone_id=AZ_ID)
    tracer.put_annotation("availability_zone_id", AZ_ID)
    return app.resolve(event, context)

Now every log line from this function includes the AZ:

{
    "level": "INFO",
    "message": "Routing decision",
    "availability_zone_id": "use1-az4",
    "endpoint": "cache-az4.internal:6379",
    "same_az": true,
    "service": "lambda-metadata-explorer"
}

During an AZ incident, you can filter to the affected zone instantly in CloudWatch Logs Insights:

fields @timestamp, availability_zone_id, message, level
| filter availability_zone_id = "use1-az4"
| filter level = "ERROR"
| sort @timestamp desc

The tracer.put_annotation call adds the AZ ID as an X-Ray annotation on every trace. You can filter traces by AZ in the X-Ray console using:

annotation.availability_zone_id = "use1-az4"

This lets you isolate all traces from a specific AZ during an incident. Click any trace segment and check the Annotations tab to see the AZ ID.

This alone is worth the feature. Before March 19, you simply could not do this with Lambda.

Use Case 1: Exploring the Metadata (GET /metadata)

The first route is straightforward - it returns the AZ metadata from both the Powertools utility and the raw HTTP endpoint, plus the resolved AZ name. This is mostly educational, showing developers both access methods side by side:

@app.get("/metadata")
@tracer.capture_method
def explore_metadata():
    az_name = _resolve_az_name()  # EC2 API call, cached after first use
    metrics.add_metric(name="MetadataRequests", unit=MetricUnit.Count, value=1)

    return {
        "availability_zone": {
            "az_id": AZ_ID,
            "az_name": az_name,
        },
        "execution_context": {
            "function_name": os.environ.get("AWS_LAMBDA_FUNCTION_NAME", "local"),
            "region": os.environ.get("AWS_REGION", "unknown"),
            "memory_mb": os.environ.get("AWS_LAMBDA_FUNCTION_MEMORY_SIZE", "unknown"),
        },
    }

Seeing the Multi-AZ Distribution

There's a subtlety here. If you call the endpoint 20 times sequentially, you'll likely see the same AZ every time. That's because Lambda reuses warm execution environments, and a warm instance stays in the same AZ for its lifetime.

To actually see Lambda's multi-AZ distribution, you need to force it to spin up multiple execution environments simultaneously. The included test script fires 60 concurrent requests using xargs -P, which overwhelms a single warm instance and forces Lambda to create new environments across different AZs:

# 60 total requests, 20 concurrent at a time
./scripts/test_az_distribution.sh $API_URL 20 60

--- AZ Distribution ---

Count  AZ ID          AZ Name
-----  -------------- -------
22     use1-az4       us-east-1d
20     use1-az1       us-east-1a
18     use1-az6       us-east-1f

Unique AZs observed: 3
Total responses:     60

Lambda spreads execution environments across multiple AZs for resilience. Now you can actually see it happening, and with the AZ-enriched CloudWatch dashboard, you can monitor this distribution in real time.

This is also why the chaos engineering use case (Use Case 3) needs concurrent traffic. If you only have one warm instance, targeting a specific AZ is hit-or-miss. With concurrent load, you'll have instances spread across AZs, and the ones in the target AZ will return 503 while the others continue serving normally - exactly how a real AZ outage would behave.

Use Case 2: Same-AZ Routing (POST /route)

This is the use case with the most immediate dollar impact. Cross-AZ data transfer costs $0.01/GB in each direction. If you have a Lambda function making thousands of calls per second to an ElastiCache cluster or RDS read replica, and it's routing to whatever endpoint it gets, you're paying for cross-AZ traffic on roughly two-thirds of those calls (assuming 3 AZs).

With the AZ metadata, you can route to the same-AZ endpoint:

@app.post("/route")
@tracer.capture_method
def route_request():
    body = app.current_event.json_body or {}

    # In production, load from SSM Parameter Store or service discovery
    endpoint_map = json.loads(os.environ.get("AZ_ENDPOINT_MAP", "{}")) or {
        "use1-az1": "cache-az1.internal:6379",
        "use1-az2": "cache-az2.internal:6379",
        "use1-az4": "cache-az4.internal:6379",
        "use1-az6": "cache-az6.internal:6379",
    }
    fallback = os.environ.get("FALLBACK_ENDPOINT", "cache-primary.internal:6379")

    selected = endpoint_map.get(AZ_ID, fallback)
    same_az = AZ_ID in endpoint_map

    metrics.add_metric(name="RoutingDecisions", unit=MetricUnit.Count, value=1)
    tracer.put_annotation("routing_type", "same-az" if same_az else "cross-az")

    return {
        "az_id": AZ_ID,
        "routing": {"selected_endpoint": selected, "same_az": same_az},
        "cost_insight": {
            "cross_az_cost_per_gb": "$0.01 per direction ($0.02 round-trip)",
            "same_az_cost_per_gb": "$0.00",
        },
    }

The endpoint map is a simple AZ-ID-to-endpoint mapping. In production, you'd load this from SSM Parameter Store using the Powertools Parameters utility with caching:

from aws_lambda_powertools.utilities import parameters

endpoint_map = parameters.get_parameter("/myapp/az-endpoint-map", transform="json")

For an ElastiCache cluster, you'd use the node endpoints for each AZ. For RDS with read replicas, you'd map to the replica in each AZ. The key insight is that you now know which AZ you're in and can make an informed routing decision.

Cost math: At 1 TB/month of data transfer with 3 AZs, roughly 67% crosses AZ boundaries. AWS charges $0.01/GB in each direction, so cross-AZ round-trips cost $0.02/GB effectively. That's ~$13.40/month in cross-AZ charges for this example. With same-AZ routing, it drops to near zero. The savings scale linearly with traffic volume.

Use Case 3: AZ-Aware Chaos Engineering (POST /chaos)

The third route is about resilience testing. You specify a target AZ, and if the function happens to be running there, it injects a failure. This lets you test how your upstream systems - load balancers, retry logic, circuit breakers - handle an AZ going unhealthy.

@app.post("/chaos")
@tracer.capture_method
def chaos_test():
    body = app.current_event.json_body or {}
    target_az = body.get("target_az", "")
    failure_mode = body.get("failure_mode", "error")

    if target_az and AZ_ID == target_az:
        metrics.add_metric(name="ChaosFailuresInjected", unit=MetricUnit.Count, value=1)
        logger.warning("Chaos: injecting failure", extra={
            "target_az": target_az,
            "failure_mode": failure_mode,
        })

        if failure_mode == "latency":
            time.sleep(min(body.get("delay_seconds", 3), 10))
            return {"az_id": AZ_ID, "chaos": {"triggered": True, "mode": "latency"}}

        raise ServiceError(503, f"Simulated AZ failure in {AZ_ID}")

    return {"az_id": AZ_ID, "chaos": {"triggered": False, "target_az": target_az}}

Two failure modes:

error - Returns HTTP 503. Tests whether callers retry and eventually hit a healthy AZ.
latency - Injects a delay (capped at 10 seconds). Tests whether callers have appropriate timeouts and circuit breakers.

In a real project, you'd combine this with an ALB or API Gateway with health checks. Target a specific AZ and observe whether traffic shifts to healthy AZs. The CloudWatch dashboard shows failures concentrated in the target AZ while other AZs continue serving normally - exactly what you'd see during a real AZ impairment.

One important detail for testing: make test-chaos auto-detects your current warm instance's AZ and targets it, guaranteeing a triggered failure. But for a more realistic simulation, fire concurrent traffic while targeting a specific AZ - some requests will fail (those in the target AZ) while others succeed (those in healthy AZs).

Before this feature, you'd need to deploy different code per AZ or use external feature flags combined with some mechanism to map invocations to AZs. Now it's a simple string comparison.

The SAM Template

The deployment is a straightforward SAM template - one function, one API, one CloudWatch dashboard:

Globals:
  Function:
    Runtime: python3.13
    Architectures:
      - arm64
    Timeout: 30
    MemorySize: 256
    Tracing: Active
    LoggingConfig:
      LogFormat: JSON

Resources:
  AZMetadataFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: !Sub lambda-metadata-explorer-${Environment}
      CodeUri: src/
      Handler: app.lambda_handler
      Layers:
        - !Sub arn:aws:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-python313-arm64:${PowertoolsLayerVersion}
      Environment:
        Variables:
          POWERTOOLS_SERVICE_NAME: lambda-metadata-explorer
          POWERTOOLS_METRICS_NAMESPACE: AZMetadataExplorer
          POWERTOOLS_LOG_LEVEL: !Ref LogLevel
          POWERTOOLS_TRACER_CAPTURE_RESPONSE: 'true'
          POWERTOOLS_TRACER_CAPTURE_ERROR: 'true'
      Policies:
        - Statement:
            - Effect: Allow
              Action:
                - ec2:DescribeAvailabilityZones
              Resource: '*'
      Events:
        GetMetadata:
          Type: Api
          Properties:
            Path: /metadata
            Method: get
        PostRoute:
          Type: Api
          Properties:
            Path: /route
            Method: post
        PostChaos:
          Type: Api
          Properties:
            Path: /chaos
            Method: post

ARM64 (Graviton) for better price-performance. Powertools via the AWS-managed Lambda Layer (plus pip bundling from requirements.txt to ensure v3.26.0+ with the metadata utility is available even if the Layer hasn't been updated yet). X-Ray tracing active. JSON log format. The only extra IAM permission is ec2:DescribeAvailabilityZones for the AZ name resolution - the metadata endpoint itself requires no permissions.

The template also creates a CloudWatch dashboard with AZ distribution widgets. The AZ dimension comes from metrics.set_default_dimensions(AvailabilityZoneID=AZ_ID) which tags every custom metric with the availability zone. The dashboard references explicit metric definitions for each AZ:

["AZMetadataExplorer", "MetadataRequests", "AvailabilityZoneID", "use1-az1", "service", "lambda-metadata-explorer"]

One gotcha: the ColdStart metric uses Powertools' built-in dimensions (function_name + service) rather than our custom AZ dimension, because Powertools captures it before the handler runs. So the Cold Starts widget shows a total count rather than a per-AZ breakdown.

Deploy and Try It

Prerequisites: AWS CLI, SAM CLI, Python 3.13+, an AWS account.

git clone https://github.com/RDarrylR/lambda-az-metadata.git
cd lambda-az-metadata

# Build and deploy
sam build --profile your-profile --region us-east-1
sam deploy --guided --profile your-profile --region us-east-1

# Or use the Makefile
make deploy PROFILE=your-profile

After deploying, grab the API URL from the stack outputs and explore:

# See your AZ metadata
curl -s $API_URL/metadata | python3 -m json.tool

# Test same-AZ routing
curl -s -X POST $API_URL/route \
  -H "Content-Type: application/json" \
  -d '{"message": "hello"}' | python3 -m json.tool

# Test chaos engineering (replace use1-az1 with an AZ from your /metadata response)
curl -s -X POST $API_URL/chaos \
  -H "Content-Type: application/json" \
  -d '{"target_az": "use1-az1", "failure_mode": "error"}' | python3 -m json.tool

# See the AZ distribution across 60 concurrent invocations
make test-distribution

Cleanup:

sam delete --stack-name lambda-az-metadata --profile your-profile --region us-east-1

What It Costs

This demo runs well within the AWS free tier. Lambda gives you 1M requests/month and 400K GB-seconds. API Gateway gives you 1M calls for the first 12 months. The metadata endpoint itself is free - no additional charges.

Gotchas

A few things to watch for:

Region-specific AZ IDs. The demo code and dashboard use us-east-1 AZ IDs (use1-az1 through use1-az6). If you deploy to a different region, update the default endpoint map in app.py and the dashboard metrics in template.yaml with your region's AZ IDs (e.g., usw2-az1 for us-west-2).
Caching is your friend. The metadata response is immutable within an execution environment. Resolve it once at module level and forget about it. There is no reason to call the endpoint on every invocation.
VPC doesn't matter. The metadata endpoint is localhost inside the execution environment - it works identically with or without a VPC. No VPC endpoints, NAT gateways, or security group rules needed.
SnapStart changes things. If you use SnapStart, the execution environment might move to a different AZ after restore. The metadata endpoint cooperates with SnapStart by reducing the cache TTL during initialization, so Powertools can detect the need to re-fetch after restore and handles this automatically. But this demo resolves AZ_ID at module level during init - that code runs during the snapshot phase. After restore, the cached module-level variable could be wrong. For SnapStart, move the AZ resolution into the handler or use a @register_checkpoint_restore hook to re-resolve after restore.
The response may grow. The documentation says additional fields may be added in future updates. Don't fail on unknown fields.
log_event=True and CAPTURE_RESPONSE in production. I use both here for debugging visibility, but in production, disable them if your payloads contain sensitive data. log_event=True logs full request bodies to CloudWatch, and POWERTOOLS_TRACER_CAPTURE_RESPONSE stores full response bodies as X-Ray metadata. Control both via environment variables without code changes.
Layer version. The get_lambda_metadata() utility requires Powertools v3.26.0+ (Lambda Layer version 30+, released March 20, 2026). The SAM template defaults to version 30. Check the layer version history for the latest.
Import path. As of v3.26.0, the working import path is from aws_lambda_powertools.utilities.metadata import get_lambda_metadata. The Lambda documentation currently shows utilities.lambda_metadata, which may be corrected in a future docs update. Verify against your installed version if in doubt.

What's Next

The metadata endpoint currently returns only AvailabilityZoneID, but the versioned API path (/2026-01-15/) and the documentation's note about future fields suggest more is coming. Execution environment ID? Host-level metadata? We'll see.

For now, the AZ ID alone unlocks meaningful improvements in observability, cost optimization, and resilience testing. It's one of those features that's so obviously useful, you wonder why it took this long.

The code is on GitHub. Clone it, deploy it, and check your CloudWatch dashboard to see which AZs your Lambda functions are actually running in.

Resources

Connect with me on X, Bluesky, LinkedIn, GitHub, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

A Complete Terraform Setup for EKS Auto Mode: Is it right for you?

Darryl Ruggles — Tue, 24 Mar 2026 16:41:12 +0000

One of the things I appreciate about AWS is the range of container options available. I recently wrote about why ECS is my default choice for containers on AWS - it's simple, powerful, and handles most workloads well. But when you need Kubernetes specifically - for portability, ecosystem tooling, or team expertise - EKS Auto Mode is an interesting approach worth exploring.

Instead of managing node groups, installing Karpenter, configuring the VPC CNI plugin, deploying the AWS Load Balancer Controller, setting up the EBS CSI driver, and keeping all of those components updated and compatible with each other - you enable a single flag and AWS handles all of it. The data plane becomes a managed service. The limitations that Auto Mode has are not insignificant but depending on the use case it may make sense for many teams to consider using Auto Mode.

I built a complete production-ready EKS Auto Mode cluster with Terraform to see how it works in practice. All the code is on GitHub: github.com/RDarrylR/eks-auto-mode-terraform.

This post walks through everything: what Auto Mode actually manages, how the Terraform setup works, deploying a real application, autoscaling from pods to nodes, observability with Container Insights, the security posture you get out of the box, custom NodePools for cost optimization, and the honest trade-offs you should know about before adopting it.

Why EKS Auto Mode?

If you've ever managed an EKS cluster, you know the operational burden. The control plane is managed by AWS, but everything else is on you:

Node management - Choosing instance types, managing AMIs, patching OS, handling node drains and replacements
Networking - Installing and configuring the VPC CNI plugin, managing pod IP allocation, setting up network policies
Storage - Deploying the EBS CSI driver, managing StorageClasses, handling volume lifecycle
Load balancing - Installing the AWS Load Balancer Controller, configuring IngressClasses, managing target group bindings
Scaling - Deploying Karpenter or Cluster Autoscaler, configuring NodePools, tuning consolidation policies
Add-on compatibility - Making sure all these components work together after every Kubernetes version upgrade

Multiply this across multiple clusters and environments, and you are spending more time on infrastructure plumbing than on the applications running on top of it.

EKS Auto Mode eliminates all of this. AWS takes over the entire data plane - compute, networking, storage, load balancing, and health monitoring. The only pods running in your cluster are the Kubernetes Metrics Server. Everything else runs off-cluster as managed services.

What AWS Manages For You

When you enable Auto Mode, AWS takes responsibility for six major areas:

Compute

Auto Mode uses Karpenter under the hood, running it as a managed service outside your cluster. You never see Karpenter pods, never upgrade them, never troubleshoot their OOM kills. Karpenter is a significant upgrade over the standard Cluster Autoscaler in my experience. With Auto Mode, AWS handles:

Node provisioning, scaling, and termination
AMI selection - custom Bottlerocket variants with SELinux enforcing and read-only root filesystem
OS patching via a continuous deployment pipeline with CVE scanning and staged rollout
14-day default node expiration (configurable up to 21 days) with automatic replacement
Spot Instance interruption handling
GPU support with drivers for NVIDIA, Neuron (Trainium/Inferentia), and EFA

The nodes are EC2 managed instances - a new EC2 primitive where operational control is delegated to AWS. They appear in your account, use your Reserved Instances and Savings Plans, but you cannot SSH into them or modify their configuration directly.

Networking

The VPC CNI runs as a managed component:

Pod networking with IPv4 and IPv6 support
Network Policy enforcement
Node-local DNS - CoreDNS runs as a system service directly on each node rather than as a traditional cluster deployment. This eliminates cross-node DNS latency and removes CoreDNS pods from your workload scheduling
Pod Identity Agent (built-in, no add-on installation)

Storage

The EBS CSI driver is managed with provisioner ebs.csi.eks.amazonaws.com:

Persistent volume provisioning with configurable IOPS, throughput, and encryption
Ephemeral storage configuration
Automatic NVMe setup for supported instance types

Load Balancing

The AWS Load Balancer Controller runs as a managed service:

ALB and NLB provisioning from Kubernetes Ingress/Service resources
Uses IngressClass alb (controller: eks.amazonaws.com/alb) and loadBalancerClass eks.amazonaws.com/nlb
Supports Ingress API and Service annotations only
Gateway API is not supported. The managed ALB controller only supports the Ingress API and Service annotations. This could be a deal breaker for teams adopting the Gateway API.

Health Monitoring

Automatic detection and repair of unhealthy nodes
Replaces unresponsive kubelets and PID-exhausted nodes
Reports through Kubernetes events and conditions

The Terraform Setup

I used the terraform-aws-modules/eks/aws community module (v21.x) with the AWS provider ~> 6.0. There is no separate official AWS Terraform module for EKS Auto Mode - the community module maintained by Anton Babenko is the de facto standard for EKS on Terraform, and it has first-class Auto Mode support. When you set compute_config.enabled = true, the module automatically handles the storage and load balancing configuration, hardcodes the right bootstrap settings, creates least-privilege IAM roles, and attaches the five managed policies the cluster role needs. You could build this with raw aws_eks_cluster resources, but the module eliminates several hundred lines of boilerplate and is referenced in AWS's own documentation.

The complete setup creates a production-ready cluster with VPC, Auto Mode enabled, custom NodePools, Container Insights observability, and least-privilege IAM configuration. Here is what the full infrastructure looks like:

Provider Configuration

terraform {
  required_version = ">= 1.10"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.38"
    }
    kubectl = {
      source  = "alekc/kubectl"
      version = "~> 2.1"
    }
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Project     = var.project_name
      Environment = var.environment
      ManagedBy   = "terraform"
    }
  }
}

A note on the kubectl provider: I use the alekc/kubectl fork rather than gavinbunney/kubectl. The alekc fork is actively maintained and handles CRD resources (like Karpenter NodePools and NodeClasses) reliably.

VPC

The VPC uses the community VPC module (v6.x) with both private and public subnets. The key detail is the subnet tags - EKS Auto Mode uses these to discover where to place nodes and load balancers:

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 6.0"

  name = "${var.project_name}-vpc"
  cidr = "10.0.0.0/16"

  azs             = slice(data.aws_availability_zones.available.names, 0, 3)
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway   = true
  single_nat_gateway   = true
  enable_dns_hostnames = true

  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }
}

A few things worth noting:

Single NAT gateway for cost optimization in non-production. Use one per AZ for production HA.
No VPC endpoints in this demo for simplicity. Production setups should add VPC endpoints for ECR (both ecr.api and ecr.dkr), S3, STS, and CloudWatch to reduce NAT gateway costs and keep traffic private. This is especially impactful at scale - NAT gateway data processing charges add up quickly.
Public subnet tags with kubernetes.io/role/elb tell the managed load balancer controller where to create internet-facing ALBs/NLBs.
Private subnet tags with kubernetes.io/role/internal-elb are for internal load balancers.
Nodes always run in private subnets. The load balancers sit in public subnets.

EKS Cluster with Auto Mode

This is where Auto Mode gets enabled:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 21.0"

  name               = var.project_name
  kubernetes_version = "1.35"

  endpoint_public_access = true
  # UNCOMMENT for production: restrict to your IP ranges or use private-only access
  # cluster_endpoint_public_access_cidrs = ["YOUR_CIDR/32"]

  # Enable Auto Mode - this single block enables:
  #   - Managed Karpenter (compute)
  #   - Managed EBS CSI driver (storage_config derived from compute_config.enabled)
  #   - Managed ALB/NLB Controller (elastic_load_balancing derived from compute_config.enabled)
  #   - Managed VPC CNI, kube-proxy, CoreDNS (bootstrap_self_managed_addons hardcoded false)
  compute_config = {
    enabled    = true
    node_pools = ["general-purpose", "system"]
  }

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  # Auto Mode IAM resources - creates the node IAM role with required policies
  create_node_iam_role           = true
  create_auto_mode_iam_resources = true

  # Cluster access - grants the deploying identity cluster admin
  enable_cluster_creator_admin_permissions = true

  # Addons - metrics-server for HPA + Container Insights for observability
  addons = {
    metrics-server = {
      most_recent = true
    }
    amazon-cloudwatch-observability = {
      most_recent = true
      pod_identity_association = [{
        role_arn        = aws_iam_role.cloudwatch.arn
        service_account = "cloudwatch-agent"
      }]
    }
  }
}

Production notes: This demo uses a public API endpoint without CIDR restrictions and local Terraform state. For production, restrict cluster_endpoint_public_access_cidrs to your corporate IP ranges (or use private-only access with a VPN/bastion), and configure a remote backend (S3 + DynamoDB) for state locking and team collaboration.

The critical parameters:

compute_config.enabled = true - This is the flag that enables Auto Mode. The module automatically derives storage_config (block storage) and kubernetes_network_config (elastic load balancing) from this single flag - you do not need to set them separately.
compute_config.node_pools - The built-in pools to activate. general-purpose handles your application workloads (On-Demand, C/M/R families, gen 4+, AMD and ARM). system handles EKS internal add-ons with a CriticalAddonsOnly taint.
bootstrap_self_managed_addons - The module hardcodes this to false when Auto Mode is enabled, so you do not need to set it. Auto Mode manages VPC CNI, kube-proxy, and CoreDNS off-cluster.
create_auto_mode_iam_resources = true - Creates the node IAM role with AmazonEKSWorkerNodeMinimalPolicy and AmazonEC2ContainerRegistryPullOnly - the least-privilege policies AWS recommends for Auto Mode (not the broader AmazonEKSWorkerNodePolicy or AmazonEC2ContainerRegistryReadOnly).
create_node_iam_role = true - Must be true (default) so the module creates the Auto Mode node role with nodeRoleArn. Setting this to false without providing a custom node_role_arn causes the EKS API to reject the cluster.
addons - Deploys the metrics-server (required for HPA to read CPU/memory metrics) and CloudWatch Container Insights with Pod Identity for observability (more on this below).

The module also automatically attaches five managed policies to the cluster IAM role: AmazonEKSClusterPolicy, AmazonEKSComputePolicy, AmazonEKSBlockStoragePolicy, AmazonEKSLoadBalancingPolicy, and AmazonEKSNetworkingPolicy. Plus a custom tags policy for Auto Mode resource tagging.

Important Terraform caveat: When disabling Auto Mode later, you must first apply with compute_config.enabled = false before removing the block entirely. Removing it in one step causes issues.

IAM Architecture

The setup follows AWS recommended least-privilege principles with three distinct roles:

Role	Policies	Purpose
Cluster Role	AmazonEKSClusterPolicy, AmazonEKSComputePolicy, AmazonEKSBlockStoragePolicy, AmazonEKSLoadBalancingPolicy, AmazonEKSNetworkingPolicy	Cluster control plane + Auto Mode capabilities
Node Role	AmazonEKSWorkerNodeMinimalPolicy, AmazonEC2ContainerRegistryPullOnly	Minimal node permissions (not the broader WorkerNodePolicy)
CloudWatch Role	CloudWatchAgentServerPolicy, AWSXrayWriteOnlyAccess	Container Insights via Pod Identity

The node role uses AmazonEKSWorkerNodeMinimalPolicy instead of the broader AmazonEKSWorkerNodePolicy, and AmazonEC2ContainerRegistryPullOnly instead of AmazonEC2ContainerRegistryReadOnly. These are the least-privilege policies AWS specifically recommends for Auto Mode.

Custom NodePool for Spot Instances

The built-in general-purpose pool uses On-Demand instances only, with C, M, and R instance families (generation 4+, both AMD and ARM/Graviton). For cost optimization, you can create a custom NodePool that uses Spot:

resource "kubectl_manifest" "spot_nodeclass" {
  yaml_body = yamlencode({
    apiVersion = "eks.amazonaws.com/v1"
    kind       = "NodeClass"
    metadata = {
      name = "spot-class"
    }
    spec = {
      role = module.eks.node_iam_role_name
      subnetSelectorTerms = [
        {
          tags = {
            "kubernetes.io/role/internal-elb" = "1"
          }
        }
      ]
      securityGroupSelectorTerms = [
        {
          tags = {
            "aws:eks:cluster-name" = var.project_name
          }
        }
      ]
    }
  })

  depends_on = [module.eks]
}

resource "kubectl_manifest" "spot_nodepool" {
  yaml_body = yamlencode({
    apiVersion = "karpenter.sh/v1"
    kind       = "NodePool"
    metadata = {
      name = "spot-compute"
    }
    spec = {
      template = {
        metadata = {
          labels = {
            "workload-type" = "spot-eligible"
          }
        }
        spec = {
          nodeClassRef = {
            group = "eks.amazonaws.com"
            kind  = "NodeClass"
            name  = "spot-class"
          }
          requirements = [
            {
              key      = "karpenter.sh/capacity-type"
              operator = "In"
              values   = ["spot"]
            },
            {
              key      = "eks.amazonaws.com/instance-category"
              operator = "In"
              values   = ["c", "m", "r"]
            },
            {
              key      = "eks.amazonaws.com/instance-generation"
              operator = "Gte"
              values   = ["5"]
            }
          ]
          expireAfter = "336h"
        }
      }
      disruption = {
        consolidationPolicy = "WhenEmptyOrUnderutilized"
        consolidateAfter    = "1m"
        budgets = [
          { nodes = "20%" }
        ]
      }
      weight = 80
    }
  })

  depends_on = [
    module.eks,
    kubectl_manifest.spot_nodeclass,
  ]
}

Key differences from self-managed Karpenter:

NodeClass uses eks.amazonaws.com/v1 instead of karpenter.k8s.aws/v1
NodeClass requires role - You must specify role (IAM role name) or instanceProfile. The EKS API rejects NodeClasses without one of these. Use module.eks.node_iam_role_name from the Terraform module
Labels use eks.amazonaws.com/* instead of karpenter.k8s.aws/*
NodePool API is the same karpenter.sh/v1 in both cases
consolidateAfter is required in the disruption block - omitting it causes a validation error
The weight field (80 for Spot vs the default pool) determines scheduling preference
Instance families (C/M/R) and generation (5+) are specified via requirements - Karpenter selects the specific instance size based on pod resource requests at scheduling time

To direct workloads to Spot nodes, add a nodeSelector:

spec:
  nodeSelector:
    workload-type: spot-eligible

Deploying a Real Application

To demonstrate the full workflow, I deployed a Python FastAPI application with an ALB, autoscaling, and a load generator for scaling demos.

Application Code

A demo API with four endpoints designed to showcase Auto Mode's capabilities:

/health - Health check for probes and ALB health checks
/info - Returns pod and node details including the instance type Karpenter selected, whether the node is Spot or On-Demand, which NodePool it belongs to, and the AZ. Uses the Kubernetes API with a read-only RBAC binding to query node labels
/compute/{iterations} - CPU-intensive endpoint for triggering HPA scaling
/stress/{mb}?seconds=30 - Allocates memory to demonstrate memory pressure. Holds the allocation then auto-releases

The most interesting part is the /info endpoint, which queries the Kubernetes API to read node labels set by Karpenter:

def _get_node_info():
    """Fetch node labels via Kubernetes API to show Auto Mode scheduling decisions."""
    from kubernetes import client, config
    config.load_incluster_config()
    v1 = client.CoreV1Api()

    node = v1.read_node(os.getenv("NODE_NAME"))
    labels = node.metadata.labels or {}

    return {
        "node_name": os.getenv("NODE_NAME"),
        "instance_type": labels.get("node.kubernetes.io/instance-type", "unknown"),
        "capacity_type": labels.get("karpenter.sh/capacity-type", "unknown"),
        "nodepool": labels.get("karpenter.sh/nodepool", "unknown"),
        "zone": labels.get("topology.kubernetes.io/zone", "unknown"),
        "arch": labels.get("kubernetes.io/arch", "unknown"),
        "compute_type": labels.get("eks.amazonaws.com/compute-type", "unknown"),
    }

The other endpoints: /compute/{iterations} does CPU-intensive work for HPA scaling demos, and /stress/{mb} allocates memory (auto-releases after a configurable hold period) for memory pressure demos. Full app code on GitHub.

Here is a real response from the running demo:

{
  "pod": {
    "name": "demo-api-847f8d6ff5-9gvsq",
    "namespace": "demo",
    "ip": "10.0.3.184",
    "service_account": "demo-api"
  },
  "node": {
    "node_name": "i-01e881b330651bbc6",
    "instance_type": "c7g.large",
    "capacity_type": "spot",
    "nodepool": "spot-compute",
    "zone": "us-east-1c",
    "arch": "arm64",
    "compute_type": "auto"
  }
}

Notice Karpenter selected a c7g.large - a Graviton (ARM) Spot instance. This is a key insight: Karpenter optimizes for cost, and Graviton Spot instances are often the cheapest available option. The built-in general-purpose pool includes both AMD and ARM architectures, and the custom Spot pool does too. Hit the endpoint multiple times and you may see responses from different pods on different nodes, potentially in different AZs.

Building the Container Image

The Dockerfile is a standard Python 3.13-slim image running uvicorn.

Important: Build for the right architecture. Auto Mode's NodePools include ARM (Graviton) instances by default - Karpenter often selects them because they are the cheapest option. You need to target the right platform:

# Build for ARM (Graviton nodes - most common with Auto Mode Spot)
docker buildx build --platform linux/arm64 -t my-app:latest --load .

# Or if your nodes are AMD64
docker buildx build --platform linux/amd64 -t my-app:latest --load .

Using :latest tags with Auto Mode requires imagePullPolicy: Always in the deployment to ensure nodes pull the newest image on every pod creation.

Kubernetes Manifests

The key manifest design decisions (full manifests on GitHub):

RBAC for /info - The pod needs a ServiceAccount with a ClusterRole to read node labels. This is a minimal read-only get on nodes:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-reader
rules:
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get"]

Downward API env vars - The deployment injects NODE_NAME, POD_NAME, POD_NAMESPACE, POD_IP, and POD_SERVICE_ACCOUNT via the Kubernetes downward API. The /info endpoint uses NODE_NAME to query the Kubernetes API for that node's labels:

env:
  - name: NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName

Other important details:

imagePullPolicy: Always is required when using :latest tags, since nodes cache images
IngressClass is alb (not eks.amazonaws.com/alb, since / is invalid in RFC 1123 subdomain names). You must explicitly create the IngressClassParams and IngressClass resources - Auto Mode does not auto-register them
Resource requests are set on every container. This is critical with Auto Mode - Karpenter provisions nodes based on requested resources. If you don't set requests, Auto Mode has no signal for right-sizing
Health probes are defined for proper rolling updates and load balancer health checks

Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: demo-api
  namespace: demo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: demo-api
  minReplicas: 2
  maxReplicas: 10
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 120
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 70

The HPA scales on both CPU and memory - whichever metric exceeds its threshold first triggers a scale-up. The /compute endpoint drives CPU scaling, while the /stress endpoint drives memory scaling. Both trigger the same pod-to-node scaling chain through Auto Mode.

The stabilizationWindowSeconds: 120 shortens the default 5-minute stabilization window for scale-down, allowing the HPA to reduce replicas faster after load drops - useful for demos and workloads where you want quicker consolidation.

Important: The HPA requires the Kubernetes Metrics Server to read CPU/memory utilization. The metrics-server is deployed as an EKS addon in the Terraform configuration (see the EKS cluster configuration above). Without it, HPA reports cpu: <unknown> and never scales.

PodDisruptionBudget

Since Auto Mode consolidation is enabled by default, a PDB protects your application during node replacement:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: demo-api
  namespace: demo
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: demo-api

This ensures at least one pod is always available when Auto Mode consolidates or rotates nodes. Without a PDB, Karpenter could drain all pods from a node simultaneously during consolidation.

When the HPA scales pods up and existing nodes can't fit them, Auto Mode's managed Karpenter automatically provisions new nodes. When pods scale down and nodes become underutilized, consolidation kicks in and nodes are terminated. You never interact with the scaling infrastructure directly.

Autoscaling in Action

One of the most compelling features of Auto Mode is how seamlessly pod-level and node-level scaling work together. The repo includes everything needed to demonstrate this end-to-end.

Manual Scaling Demo

The quickest way to see Auto Mode's node provisioning is to scale the deployment beyond what existing nodes can handle:

# Scale to 15 replicas - watch Auto Mode provision new nodes
kubectl scale deployment demo-api -n demo --replicas=15

# Watch the scaling in real time
kubectl get pods -n demo -o wide -w    # Pods scheduling across nodes
kubectl get nodes -w                    # New nodes appearing
kubectl get events -A --sort-by='.lastTimestamp' | grep -E 'Nominated|Launched'

# Scale back down - watch consolidation
kubectl scale deployment demo-api -n demo --replicas=2

When you scale to 15 replicas, each requesting 256m CPU and 512Mi memory, the existing nodes cannot fit all the pods. Auto Mode's managed Karpenter evaluates the pending pods, selects optimal instance types from the C/M/R families (generation 4+, including Graviton), and provisions new nodes. The instance type is determined by the aggregate resource requests and cost optimization - Karpenter might select a c7g.large (Graviton) over an m7i.large if it is cheaper and fits the workload. You can verify this by hitting the /info endpoint to see exactly what Karpenter chose.

When you scale back down, pods are removed and nodes become underutilized. Auto Mode's consolidation policy (WhenEmptyOrUnderutilized) kicks in, draining and terminating nodes that are no longer needed.

Load-Driven Scaling Demo

For a more realistic demonstration, the repo includes a load generator - a busybox pod that continuously hits /compute/500000 to drive CPU utilization above the HPA threshold.

The full scaling chain works like this:

Load generator drives traffic to the API
CPU utilization exceeds the HPA's 60% target
HPA increases the replica count
New pods enter Pending state (no room on existing nodes)
Auto Mode's Karpenter detects pending pods and provisions new nodes
Pods schedule onto the new nodes and begin serving traffic

# Start the load generator
kubectl apply -f k8s/load-generator.yaml

# Watch the full chain: HPA -> pods -> nodes
kubectl get hpa -n demo -w
kubectl get pods -n demo -o wide -w
kubectl get nodes -w

# Stop the load
kubectl delete -f k8s/load-generator.yaml
# HPA scales pods down, Auto Mode consolidates nodes

The Makefile includes convenience commands for all of this: make demo-scale-up, make demo-scale-down, make demo-load-start, make demo-load-stop, and make demo-watch.

Observability with Container Insights

Since February 2026, Auto Mode supports Managed Capability Logging - you can deliver logs from compute autoscaling (Karpenter), block storage, load balancing, and pod networking to CloudWatch Logs, S3, or Kinesis Data Firehose using CloudWatch Vended Logs. This closed what was previously the biggest observability gap.

CloudWatch Container Insights complements this by providing node, pod, and container-level metrics with pre-built dashboards.

Terraform Setup

The CloudWatch observability addon is deployed as an EKS addon with Pod Identity - the recommended approach for Auto Mode:

# In the EKS module
addons = {
  metrics-server = {
    most_recent = true
  }
  amazon-cloudwatch-observability = {
    most_recent = true
    pod_identity_association = [{
      role_arn        = aws_iam_role.cloudwatch.arn
      service_account = "cloudwatch-agent"
    }]
  }
}

The metrics-server addon is required for HPA to read CPU and memory utilization. Without it, HPA cannot function.

The IAM role uses Pod Identity (not IRSA) with least-privilege policies:

data "aws_iam_policy_document" "cloudwatch_assume" {
  statement {
    actions = ["sts:AssumeRole", "sts:TagSession"]
    principals {
      type        = "Service"
      identifiers = ["pods.eks.amazonaws.com"]
    }
  }
}

resource "aws_iam_role" "cloudwatch" {
  name               = "${var.project_name}-cloudwatch-observability"
  assume_role_policy = data.aws_iam_policy_document.cloudwatch_assume.json
}

resource "aws_iam_role_policy_attachment" "cloudwatch_agent" {
  role       = aws_iam_role.cloudwatch.name
  policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
}

resource "aws_iam_role_policy_attachment" "xray_write" {
  role       = aws_iam_role.cloudwatch.name
  policy_arn = "arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess"
}

Key decisions:

Pod Identity over IRSA - Pod Identity is the recommended approach for EKS addons. It is built into Auto Mode (the Pod Identity Agent runs on every node automatically).
Trust principal is pods.eks.amazonaws.com - This is the Pod Identity service, not ec2.amazonaws.com.
AWSXrayWriteOnlyAccess - Required alongside CloudWatchAgentServerPolicy for full observability including trace collection.

What You Get

Once deployed, Container Insights provides:

Node metrics - CPU, memory, network, and disk utilization per node
Pod metrics - CPU and memory usage per pod, with requests vs actual
Container-level performance - Individual container metrics within pods
Cluster dashboards - Pre-built CloudWatch dashboards for cluster health
Log collection - Application and system logs forwarded to CloudWatch Logs

This complements Managed Capability Logging by providing resource-level metrics that controller logs don't surface. Container Insights gives you the visibility to understand scaling behavior, identify resource bottlenecks, and verify that consolidation is working as expected.

Security Out of the Box

Auto Mode provides a strong security posture with minimal configuration. Here is what you get by default:

Node Security

Bottlerocket OS - Minimal, purpose-built container OS with SELinux enforcing and a read-only root filesystem. No package manager, no shell access.
IMDSv2 enforced - Instance metadata hop limit hardcoded to 1, preventing containers from accessing node credentials via IMDS.
No SSH/SSM access - Nodes are locked down. You cannot open a shell on an Auto Mode node.
14-day node rotation - Nodes expire after 14 days by default (configurable up to 21 days) and are automatically replaced with the latest patched Bottlerocket AMI, ensuring CVE fixes are applied continuously.
Drift detection - When AWS releases a new AMI (approximately weekly), Auto Mode detects the drift and replaces nodes with the updated image.

Data Security

Encrypted EBS volumes - Auto Mode's managed EBS CSI driver supports encrypted volumes via StorageClass parameters (encrypted: "true"), ensuring persistent volumes are encrypted at rest.
allowedTopologies - StorageClasses can restrict volume provisioning to Auto Mode compute nodes via eks.amazonaws.com/compute-type: auto.
Secrets encryption - EKS supports envelope encryption for Kubernetes secrets using KMS. Add encryption_config to the EKS module to encrypt etcd secrets at rest with your own KMS key. This is not enabled by default but is recommended for production clusters handling sensitive data.

IAM Security

Least-privilege node role - Uses AmazonEKSWorkerNodeMinimalPolicy (not the broader AmazonEKSWorkerNodePolicy) and AmazonEC2ContainerRegistryPullOnly (not AmazonEC2ContainerRegistryReadOnly).
Pod Identity - Built-in, used for the CloudWatch addon. No need for IRSA or node-level IAM policies for application workloads.
Separate IAM roles - Cluster role, node role, and addon roles are distinct with non-overlapping permissions.

Network Security

Private subnets - Nodes run exclusively in private subnets with no direct internet access.
Managed VPC CNI - Network policies are enforced at the CNI level without additional tooling.
NAT gateway - Outbound traffic routes through NAT, providing a single egress point.

Resource Requests Matter More Than Ever

This deserves its own section because it is the single most important operational consideration with Auto Mode.

In a traditional EKS setup with Managed Node Groups, you might have fixed-size nodes (e.g., m5.xlarge) and the Kubernetes scheduler bins pods onto available capacity. Wasted capacity is absorbed by the fixed node pool.

With Auto Mode (and Karpenter in general), nodes are provisioned to fit the requested resources. If your pod requests 256m CPU and 512Mi memory, Karpenter might provision a small instance just for that pod. If your pod actually needs 2 vCPU but only requests 256m, it will get throttled.

Best practices for resource requests:

Always set both CPU and memory requests on every container
Use tools like Goldilocks or VPA in recommendation mode to determine actual usage before setting requests
Set requests close to actual usage - not the theoretical maximum
Set limits higher than requests for burst capacity, but not orders of magnitude higher
Monitor actual vs requested using Container Insights or Prometheus

The Karpenter Connection

Understanding that Auto Mode is Karpenter helps explain its behavior:

Consolidation is enabled by default. If a node is empty or underutilized, Auto Mode replaces it with a better-fitting instance
Node expiration is 14 days by default (max 21 days). Nodes are automatically replaced, picking up the latest Bottlerocket AMI
Drift detection triggers replacement when AWS releases a new AMI (approximately weekly)
Disruption budget of 10% prevents too many nodes from being replaced simultaneously
Instance selection is automatic - Karpenter evaluates the C/M/R families (generation 4+ for built-in, 5+ for our Spot pool) and picks the optimal size based on pending pod requests

Since February 2026, you can access Karpenter's logs through Managed Capability Logging. Configure CloudWatch Vended Logs to deliver compute autoscaling logs to CloudWatch Logs, S3, or Kinesis Data Firehose. For quick troubleshooting, Kubernetes events (from eks-auto-mode/compute) remain the fastest signal.

# View Auto Mode compute events
kubectl get events -A --sort-by='.lastTimestamp' | tail -20

# Check node provisioning
kubectl get nodes -o wide --show-labels

# See NodePool status
kubectl get nodepools
kubectl describe nodepool general-purpose

Cost Analysis

EKS Auto Mode has three cost components:

Component	Cost
EKS cluster fee	$0.10/hour ($72/month)
EC2 instances	Standard pricing (On-Demand, RI, Savings Plans, Spot)
Auto Mode management fee	~12% on top of EC2 instance costs

Example Monthly Costs

For a cluster running 3x m6i.xlarge (4 vCPU, 16 GiB each) on On-Demand:

Item	Monthly Cost
EKS cluster	$72
3x m6i.xlarge On-Demand	~$432
Auto Mode fee (~12%)	~$52
NAT Gateway (single)	~$32 + data
ALB (light traffic)	~$22
Total	~$610/month

Compare to self-managed EKS with Karpenter:

Item	Monthly Cost
EKS cluster	$72
3x m6i.xlarge On-Demand	~$432
Karpenter pod overhead	~$0 (runs on existing nodes)
NAT Gateway	~$32 + data
ALB	~$22
Total	~$558/month

The ~$52/month premium (9-12% of compute) buys you elimination of operational overhead for Karpenter, VPC CNI, EBS CSI driver, ALB controller, Pod Identity Agent, AMI lifecycle, OS patching, and node health monitoring. For most teams, that operational burden costs far more than $52/month in engineer time.

Cost Optimization Strategies

Custom Spot NodePools - Create a Spot pool (shown above) for fault-tolerant workloads. Spot savings of 60-70% significantly offset the management fee
Right-size resource requests - Auto Mode provisions exactly what you request. Accurate requests mean less wasted capacity
Consolidation - Enabled by default. Auto Mode automatically bin-packs and replaces underutilized nodes
Compute Savings Plans - Apply to Auto Mode EC2 instances just like regular instances

Recent Feature Additions

Auto Mode has been gaining features steadily since GA. Notable additions through early 2026:

Managed Capability Logging (February 2026) - Deliver logs from compute autoscaling, block storage, load balancing, and pod networking to CloudWatch Logs, S3, or Kinesis Data Firehose via CloudWatch Vended Logs. This closed the biggest observability gap.
Secondary IP mode (December 2025) - IP conservation for large clusters
SOCI parallel pull (November 2025) - Up to 60% faster container image pulls for GPU instances
Static-capacity NodePools (November 2025) - Fixed replica count for predictable workloads that need a guaranteed number of nodes
GovCloud availability for Auto Mode (October 2025) - US-East and US-West GovCloud regions
FIPS-compatible AMIs (October 2025) - For US compliance requirements
Local Zones support (October 2025) - Deploy Auto Mode nodes to edge locations
associatePublicIPAddress (August 2025) - Control public IP assignment in NodeClass
ODCR/Capacity Reservation support (June 2025) - Use On-Demand Capacity Reservations with Auto Mode
Pod subnet and security group selectors (June 2025) - podSubnetSelectorTerms and podSecurityGroupSelectorTerms in NodeClass for network isolation

Static-capacity NodePools and ODCR support are particularly relevant for enterprise readers who need guaranteed capacity for critical workloads.

Current Limitations

Auto Mode is not the right choice for every workload. Here are the constraints:

No custom AMIs - Only AWS-provided Bottlerocket variants. If you need Amazon Linux, Ubuntu, or custom-built images with compliance tooling baked in, Auto Mode won't work. Workaround: deploy host-level tooling as DaemonSets
No SSH/SSM access to nodes - Nodes are locked down. You can get node logs through Kubernetes APIs and EKS troubleshooting tools, but cannot open a shell on a node
Controller logs require extra setup - Karpenter, ALB Controller, and other managed components run off-cluster. Since February 2026, you can access their logs via Managed Capability Logging (CloudWatch Vended Logs), but this requires explicit configuration - it is not enabled by default
No self-managed Karpenter - Cannot coexist. Must uninstall before enabling Auto Mode
No third-party CNI - Only the managed VPC CNI. Calico and Cilium (as CNI) are not supported. Cilium for network policy only may work
Linux only - No Windows node support. This matters for .NET shops running Windows containers
IMDSv2 hop limit of 1 - Hardcoded, cannot be changed. This breaks some tools that need IMDS access from within pods (Datadog agent, some AWS SDK configurations). Workaround: pods that require IMDS access must run with hostNetwork: true
No DRA - Dynamic Resource Allocation is not yet supported
No Gateway API - The managed ALB controller only supports the Ingress API and Service annotations (for NLB). If you need Gateway API (HTTPRoute, GatewayClass), you would need to disable Auto Mode's load balancing capability and self-manage the AWS Load Balancer Controller, which defeats much of the purpose of Auto Mode

The most common blocker I see discussed is the lack of custom AMIs and SSH access. If your security team requires specific hardening beyond what Bottlerocket provides, or your compliance framework mandates SSH-based auditing, Auto Mode is not ready for you yet.

Migration Considerations

If you have an existing EKS cluster, migration to Auto Mode is possible but has significant caveats.

Storage Migration

PVCs created with the self-managed EBS CSI driver (ebs.csi.aws.com) cannot be directly mounted by Auto Mode's driver (ebs.csi.eks.amazonaws.com). You need to recreate PV/PVC resources. AWS provides the eks-auto-mode-ebs-migration-tool to help.

Load Balancer Migration

Existing load balancers from the self-managed AWS Load Balancer Controller cannot be transferred to the managed controller. You need a blue-green approach - create new LBs under the managed controller and shift traffic via DNS.

From Self-Managed Karpenter

Uninstall self-managed Karpenter
Uninstall AWS Load Balancer Controller
Enable Auto Mode
Recreate NodePool/NodeClass resources with Auto Mode API versions

Real-World Advice

The Playtomic engineering team documented their migration experience and concluded: "If you're already running an EKS cluster with Karpenter and AWS Load Balancer Controller, we don't recommend migrating to EKS Auto Mode." Their recommendation is to adopt Auto Mode for new clusters.

I agree with this assessment. The migration path has enough friction that the ROI for existing, well-running clusters is questionable. For greenfield deployments, Auto Mode is the obvious choice (if the limitations don't block you and you want an easier approach to running full Kubernetes on AWS)

When to Use Auto Mode vs Standard EKS vs Fargate

Criteria	EKS Auto Mode	Standard EKS	EKS on Fargate
Node management	AWS managed	You manage	Serverless (no visible nodes)
Custom AMIs	No	Yes	N/A
SSH to nodes	No	Yes	N/A
DaemonSets	Yes	Yes	No
Persistent storage	Yes (managed EBS)	Yes	EFS only
GPU support	Yes (managed drivers)	Yes	No
Instance type control	Via NodePool constraints	Full control	N/A
Third-party CNI	No	Yes	No
Windows	No	Yes	No
Cost premium	~12% on compute	None	Fargate pricing
Operational burden	Low	High	Lowest

Choose Auto Mode when you want Kubernetes with minimal operational overhead, your workloads run on Linux, you don't need custom AMIs or SSH access, and you value AWS managing the full data plane.

Choose Standard EKS when you need custom AMIs, third-party CNIs, want to use Gateway API, need SSH access, or your team already has mature Karpenter automation.

Choose Fargate when you have lightweight stateless workloads, need true serverless (no node concept), or want the absolute simplest deployment model.

Best Practices

After building and testing this setup, here are the practices I recommend:

Always set resource requests on every container. Auto Mode provisions based on requests. Missing requests means unpredictable bin-packing and potentially oversized nodes.
Use custom Spot NodePools for non-critical workloads. The built-in general-purpose pool is On-Demand only. Creating a Spot pool can cut compute costs by 60-70% for batch jobs, dev environments, and other interruptible work.
Tag your subnets correctly. Auto Mode discovers subnets via kubernetes.io/role/elb and kubernetes.io/role/internal-elb tags. Missing tags means load balancers and nodes won't land where you expect.
Use the right resource class names. StorageClass provisioner is ebs.csi.eks.amazonaws.com, IngressClass is alb (not eks.amazonaws.com/alb), NLB loadBalancerClass is eks.amazonaws.com/nlb. You must create IngressClassParams and IngressClass resources explicitly - Auto Mode does not auto-register them.
Configure PodDisruptionBudgets. Auto Mode consolidates underutilized nodes by default. PDBs ensure your application maintains availability during node replacement and consolidation. The demo includes a PDB with minAvailable: 1.
Enable observability. Configure Managed Capability Logging for controller logs (compute, storage, LB, networking) and deploy Container Insights with Pod Identity for node and pod metrics.
Start with Auto Mode for new clusters. Don't migrate existing clusters unless you have a compelling reason. The storage and load balancer migration paths add significant risk.
Test the IMDSv2 hop limit. If your pods use the AWS SDK, IMDS, or tools like the Datadog agent that rely on instance metadata, verify they work with the hardcoded hop limit of 1. Pods that need IMDS access must use hostNetwork: true.
Use least-privilege IAM. Stick with AmazonEKSWorkerNodeMinimalPolicy and AmazonEC2ContainerRegistryPullOnly for the node role. Use Pod Identity for application IAM needs.

The Complete Terraform

The full Terraform code for this article is available on GitHub: github.com/RDarrylR/eks-auto-mode-terraform

The repository includes:

VPC with public and private subnets across 3 AZs
EKS cluster with Auto Mode enabled (Kubernetes 1.35)
Custom Spot NodePool and NodeClass (Karpenter CRDs)
Least-privilege IAM roles (cluster, node, CloudWatch)
CloudWatch Container Insights with Pod Identity
ECR repository for the demo application
Demo FastAPI app with /info (node scheduling details), /compute (CPU stress), and /stress (memory pressure) endpoints
Kubernetes manifests with ALB, HPA, and IngressClass configuration
Load generator for autoscaling demos
Makefile with deployment, scaling demo, and inspection commands

# Deploy the infrastructure
make init
make plan
make apply

# Build and push the demo app
make docker-build
make docker-push

# Deploy the application
make deploy-app

# Run the scaling demo
make demo-scale-up    # Scale to 15 replicas, watch node provisioning
make demo-watch       # Watch HPA, pods, nodes, and events
make demo-scale-down  # Scale back, watch consolidation

# Or trigger HPA-driven scaling
make demo-load-start  # Deploy load generator
make demo-watch       # Watch the full scaling chain
make demo-load-stop   # Clean up

# Clean up
make delete-app
make destroy

CLEANUP (IMPORTANT!!)

If you deploy this infrastructure, it will cost you real money (~$610/month for the default configuration). Please do not forget about it.

Make sure to delete all resources when you are done:

make delete-app   # Remove K8s resources first (avoids orphaned LBs/ENIs)
make destroy      # Destroy all Terraform resources

The EKS cluster, EC2 instances, NAT gateway, and ALB all incur hourly charges. Even if you are not running any application workloads, the cluster and VPC infrastructure will continue to cost you money until it is destroyed.

Wrapping Up

EKS Auto Mode represents a significant shift in how AWS approaches Kubernetes. Instead of handing you a control plane and saying "good luck with the rest," AWS now manages the entire stack - compute, networking, storage, and add-ons - while you focus on your applications. It's similar in philosophy to Fargate, but with full EC2 visibility and Kubernetes compatibility.

The ~12% cost premium is real, but the operational simplification is substantial. No more Karpenter upgrades, no more VPC CNI compatibility issues, no more EBS CSI driver version mismatches, no more ALB Controller deployments. For teams that want Kubernetes without the infrastructure burden, Auto Mode delivers.

The Terraform setup with the community module is straightforward - enable a flag, configure your NodePools, and deploy. The biggest adjustment is remembering the Auto Mode resource class names (eks.amazonaws.com/*) and accepting that you are trading visibility and control for operational simplicity.

The security posture is strong out of the box - Bottlerocket with SELinux, IMDSv2 enforced, encrypted storage, least-privilege IAM, and automatic node rotation. Observability is now comprehensive - Managed Capability Logging delivers controller logs to CloudWatch, and Container Insights provides pod and node metrics. And the autoscaling story is seamless - HPA drives pod scaling, Karpenter drives node scaling, consolidation cleans up the rest.

If you are starting a new Kubernetes project on AWS in 2026, Auto Mode should be considered for your default choice unless you have a problem with one of the things in the limitations list or have a specific reason to manage the data plane yourself.

Resources

EKS Auto Mode Overview - AWS Docs
EKS Auto Mode Best Practices
Under the Hood: Amazon EKS Auto Mode
Getting Started with EKS Auto Mode
EKS Auto Mode Release Notes
terraform-aws-modules/eks Auto Mode Example
Migrate from Karpenter to Auto Mode
EKS Auto Mode Security Whitepaper
Playtomic Migration Experience
Amazon EKS Pricing
Full Terraform Code - GitHub
AWS London Ontario User Group - Meetups, talks, and community for AWS builders in the London, Ontario area
AWS London Ontario User Group YouTube - Recorded talks and presentations

Connect with me on X, Bluesky, LinkedIn, GitHub, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

Elastic Container Service (ECS): My default choice for containers on AWS

Darryl Ruggles — Wed, 18 Mar 2026 14:08:11 +0000

Amazon Elastic Container Service is the default AWS service I reach for whenever I need to run containers. Whether it's a batch processing pipeline that fans out across hundreds of Fargate tasks or a FastAPI backend sitting behind an Application Load Balancer, ECS handles the orchestration without the operational complexity of Kubernetes. The control plane is free, the AWS integration is deep, and as of early 2026, the deployment capabilities rival anything in the container ecosystem.

I recently presented on ECS and decided to write down the things I have learned from building real projects into one place. This blog post is the companion to that presentation - a deep dive into what ECS offers, how I use it, and how you can start building with it today.

Why Containers?

Before we talk about ECS specifically, let's talk about why containers matter. Four core principles make containers compelling:

Consistency - The same image runs identically on your laptop, in CI, and in production. No more "works on my machine."
Isolation - Each container gets its own filesystem, networking, and process space. Multiple services on the same host without conflicts.
Efficiency - Containers share the host OS kernel. Startup in seconds, not minutes. Far less overhead than virtual machines.
Portability - A Docker image runs on ECS, EKS, Lambda, or your own servers. Your business logic stays runtime-agnostic.

In my Aurora DSQL Kabob Store project, I made this a deliberate design decision - keeping business logic runtime-agnostic so the same FastAPI application could deploy on Fargate, EC2, EKS, or Lambda with minimal adapter code.

What is Amazon ECS?

ECS is a fully managed container orchestration service. You define what to run and how, and ECS handles placement, scaling, availability, and integration with the rest of AWS.

Four things make ECS stand out:

No control plane cost - Unlike EKS (~$75/month per cluster), the ECS orchestration layer is completely free. You only pay for the compute your containers use.
Deep AWS integration - IAM roles per task, CloudWatch Container Insights, native ALB target groups, Secrets Manager injection, and tight integration with every major AWS service.
Flexible compute - Choose between Fargate (serverless), EC2 (self-managed), or the new Managed Instances (AWS-managed EC2).
Deployment sophistication - Rolling updates, native blue/green, canary, and linear deployments all built in.

Core Concepts

Five building blocks make up ECS:

Cluster - A logical grouping of tasks and services. Think of it as your namespace. A cluster can span Fargate, EC2, and Managed Instances simultaneously.

Task Definition - The blueprint. A JSON document that specifies container images, CPU, memory, networking mode, volumes, IAM roles, and logging configuration. Versioned - each registration creates a new revision (e.g., my-app:3).

Task - A running instance of a task definition. One or more containers working together. On Fargate, each task gets its own elastic network interface and private IP.

Service - Maintains a desired count of tasks. Handles replacement of failed tasks, load balancer registration, auto scaling, and deployments.

Container Instance - An EC2 instance running the ECS agent, registered to a cluster. Only relevant if you're using the EC2 launch type.

Compute Options

ECS gives you four ways to provide compute for your containers. Choosing the right one depends on your workload characteristics, cost sensitivity, and operational preferences.

Fargate (Serverless) - The Default Choice

Fargate is what I recommend for most workloads. With it, there are no EC2 instances to manage. You specify CPU and memory at the task level, and AWS handles everything underneath.

CPU (vCPU)	Memory Options	Notes
0.25	512 MiB, 1-2 GB	Linux only
0.5	1-4 GB	Linux only
1	2-8 GB	Linux and Windows
2	4-16 GB	Linux and Windows
4	8-30 GB	Linux and Windows
8	16-60 GB	Linux only
16	32-120 GB	Linux only

Fargate supports both x86_64 and ARM64 (Graviton) architectures. Graviton gives you roughly 20% better price-performance for most workloads. Pricing is per-second based on vCPU and memory consumed.

Fargate Spot offers up to 70% savings for fault-tolerant workloads. When AWS reclaims capacity, tasks receive a SIGTERM with a 2-minute warning. I use this for batch processing jobs where interruption just means retrying one file.

EC2 Launch Type

This offers full control over the underlying instances. You choose the AMI, instance type, and manage patching and scaling yourself. Choose EC2 when you need GPUs, custom AMIs, specific instance families, or when sustained high utilization makes reserved instances cheaper than Fargate.

The trade-off is clear: more control, more operational burden.

ECS Managed Instances

Launched in September 2025, Managed Instances bridge the gap between Fargate simplicity and EC2 flexibility. AWS handles provisioning, auto-scaling, Bottlerocket OS patching (14-day cycles), and host replacement. You control instance type selection via attribute-based selection - say "I need 4 GPUs" and ECS picks the right instance.

The "start before stop" principle for host replacement is particularly nice - new capacity comes up before old goes down, maintaining availability throughout.

This is the answer for GPU workloads and ML inference where Fargate isn't an option but you don't want to manage EC2 fleets.

Capacity Providers

Capacity providers are the recommended way to configure compute. The strategy uses two parameters:

Base - Minimum tasks guaranteed on a specific provider (only one provider can have a base)
Weight - Relative proportion of tasks after the base is filled

Example: base 2 on FARGATE, weight 4 on FARGATE_SPOT, weight 1 on FARGATE. Your first 2 tasks are guaranteed to use on-demand Fargate. After that, 4 out of every 5 new tasks go to Spot. Cost optimization with a reliability floor.

Task Definitions - The Blueprint

The task definition is where you define everything about your containers. Here are the critical parameters:

Container Definitions

Each task definition contains one or more container definitions. Key parameters include:

image - Docker image from ECR (Elastic Container Registry), Docker Hub, or any private registry
essential - If an essential container stops, the entire task stops. Your main app is essential; your log router sidecar might not be
portMappings - Container ports, with named ports for Service Connect
healthCheck - CMD-SHELL command with configurable interval, timeout, retries, and start period
dependsOn - Container startup ordering with conditions: START, COMPLETE, SUCCESS, HEALTHY
restartPolicy - Container-level restarts without killing the entire task. Configurable attempt period (60-1800 seconds) and ignored exit codes

Task Role vs Execution Role

This distinction trips people up:

	Task Role	Execution Role
Purpose	Permissions for your application code	Permissions for the ECS agent
Used by	Your containers calling AWS APIs	ECS pulling images, pushing logs, fetching secrets
Example	S3 read/write, DynamoDB access	`ecr:GetAuthorizationToken`, `logs:CreateLogStream`, `secretsmanager:GetSecretValue`

Two different roles, two different purposes. The task role follows least privilege for your application. The execution role is about infrastructure plumbing.

Secrets Injection

ECS natively injects secrets as environment variables from Secrets Manager or SSM Parameter Store:

{
  "secrets": [
    {
      "name": "DB_PASSWORD",
      "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:my-secret"
    },
    {
      "name": "API_KEY",
      "valueFrom": "arn:aws:ssm:us-east-1:123456789:parameter/my-param"
    }
  ]
}

Secrets Manager supports specific JSON keys (arn:...secret:my-secret:username::) and version staging. Never bake secrets into container images or task definitions.

Networking

Networking Modes

Mode	Description	Use With
awsvpc	Each task gets its own ENI and private IP. Per-task security groups. Required for Fargate.	Fargate, EC2, Managed Instances
bridge	Docker's virtual network. Dynamic port mapping with ALB.	EC2 only
host	Containers use host's network directly. No port isolation.	EC2 only
none	No external networking.	EC2 only

Recommendation: Use awsvpc unless you have a specific reason not to. It's the only mode that works everywhere and gives you per-task security groups.

VPC Architecture

For production workloads, run ECS tasks in private subnets. Use VPC endpoints for ECR, S3, and CloudWatch to avoid NAT gateway data transfer costs. This is the biggest hidden cost in ECS architectures - NAT gateways charging for every image pull and log push.

Service Connect

Service Connect is the recommended way to handle service-to-service communication. It automatically injects an Envoy proxy as a sidecar, providing:

Service discovery via Cloud Map namespaces
Client-side load balancing with retries and outlier detection
Free application-level traffic metrics in CloudWatch (request count, latency, error rates)
Support for HTTP, HTTP2, gRPC, and TCP
Per-request Envoy access logs (October 2025)

Service Connect replaces AWS App Mesh, which will be discontinued in September 2026.

Deployment Strategies

ECS has the most sophisticated deployment options of any container orchestrator on AWS. As of March 2026, four strategies are available natively.

Rolling Update (Default)

Gradually replaces old tasks with new ones. Controlled by:

minimumHealthyPercent (default 100%) - Minimum tasks that must remain running
maximumPercent (default 200%) - Maximum tasks allowed during deployment

For zero-downtime with a desired count of 4: min 100%, max 200% means ECS starts 4 new tasks, waits for them to be healthy, then stops the 4 old tasks.

Blue/Green

Built into ECS without CodeDeploy dependency. Provisions 100% new capacity ("green"), validates, then shifts all production traffic at once. Key features:

Six Lambda lifecycle hooks: pre-scale-up, post-scale-up, test traffic shift, production traffic shift, post-test, post-production
Configurable bake time for instant rollback window
Works with ALB, NLB, and Service Connect

Canary (October 2025)

Two-stage deployment: shift a small percentage of traffic first (configurable from 0.1% to 99.9%), validate with real production traffic, then shift the rest. Ideal for critical user-facing services where you want to limit blast radius.

Linear (October 2025)

Gradual traffic shift in equal increments as small as 3%, with configurable bake time between each step. The most conservative approach - allows monitoring at each increment.

Deployment Circuit Breaker

The safety net across all strategies. If tasks keep failing to start or pass health checks, ECS automatically stops the deployment and optionally rolls back to the last successful version. You can wire CloudWatch Alarms into the circuit breaker to detect application-level failures, not just infrastructure failures.

Service Auto Scaling

ECS uses Application Auto Scaling with four policy types:

Target Tracking - Set a target metric value (e.g., CPU at 50%). Simplest to configure - works like a thermostat.
Step Scaling - Define explicit threshold/action pairs. React differently at different severity levels.
Scheduled Scaling - Time-based. Scale up for business hours, down at night. Supports scaling to zero (set minimum capacity to 0).
Predictive Scaling - ML-based. Analyzes historical patterns and proactively scales before demand hits. Doesn't trigger scale-ins on its own - pair with target tracking.

Important behavior: scale-in is automatically paused during deployments to protect availability.

Storage Options

Storage	Persistence	Shared	Use Case
Ephemeral	Task lifetime	Within task	Temp files, caches. Default 20 GiB, up to 200 GiB on Fargate
EFS	Persistent	Across tasks	Shared config, models, content. Multi-AZ, IAM auth
EBS	Configurable	Single task	High-IOPS data processing. One volume per task
Bind Mounts	Task lifetime	Within task	Container-to-container data sharing

EFS is the most versatile - persistent, shared across tasks, supports IAM authorization and transit encryption. EBS is for high-performance block storage when EFS throughput is insufficient. One gotcha with EFS though is that it can take minutes to create a new EFS but this is typically a one time thing for a given application.

Security

Security in ECS follows the principle of least privilege at the task level:

Task roles - Each task definition gets its own IAM role. Your batch processor gets S3 and SQS access. Your API gets DynamoDB access. Not a shared instance profile.
Secrets injection - Secrets Manager and SSM Parameter Store values injected as environment variables at startup.
Network isolation - awsvpc mode gives each task its own security group. Run tasks in private subnets.
Read-only root filesystem - Run containers with readonlyRootFilesystem: true for hardening.
Image scanning - ECR enhanced scanning with Amazon Inspector continuously scans for OS and language package vulnerabilities. As of 2026, it supports minimal base images like scratch and distroless, and shows which images are running in your clusters.

Observability

CloudWatch Container Insights (Enhanced)

Container Insights provides granular metrics at the cluster, service, task, and container level. The honeycomb visualization gives you cluster health at a glance - alarm state and utilization side by side. Deployment tracking alongside infrastructure anomalies. Cross-account monitoring for unified views. Can be enabled per-cluster or account-wide.

Logging

The awslogs driver sends container logs directly to CloudWatch Logs. As of June 2025, the default mode switched from blocking to non-blocking - if the log buffer fills up, excess logs are dropped rather than blocking your application. This prioritizes availability over logging completeness.

For advanced log routing, FireLens with Fluent Bit as a sidecar routes logs to any destination - CloudWatch, S3, Elasticsearch, Datadog, Splunk. Different containers can route to different destinations.

Tracing

Deploy the AWS Distro for OpenTelemetry (ADOT) collector as a sidecar. It receives OTLP traces on port 4317 (gRPC) or 4318 (HTTP) and exports to X-Ray automatically. This replaces the legacy X-Ray daemon approach.

ECS Exec

ECS Exec lets you shell into a running container directly - the equivalent of docker exec but for tasks running on Fargate or EC2. It uses AWS Systems Manager (SSM) under the hood, so there's no need to open inbound ports or SSH. I use this all the time - it's one of the most useful ECS features IMO.

To enable it, set enableExecuteCommand: true on your service or run task call. Then:

aws ecs execute-command \
  --cluster my-cluster \
  --task abc123 \
  --container my-app \
  --interactive \
  --command "/bin/sh"

This is invaluable for debugging - inspecting environment variables, checking network connectivity, verifying mounted volumes, or tailing logs inside the container. A few things to keep in mind:

It must be enabled before the task launches - you can't retroactively enable it on already-running tasks. For services, enabling it requires a new deployment
The task role needs SSM permissions (ssmmessages:CreateControlChannel, ssmmessages:CreateDataChannel, ssmmessages:OpenControlChannel, ssmmessages:OpenDataChannel)
The container image needs a shell (/bin/sh or /bin/bash) - scratch and distroless images won't work
All sessions are logged to CloudWatch or S3 for audit
Works with both Fargate and EC2 launch types

For quick diagnostics, the amazon-ecs-exec-checker script validates that your task, role, and agent are configured correctly.

Real-World Architecture: Batch Processing with Fargate

My Serverless Data Processor project demonstrates the batch processing pattern.

The architecture: S3 upload triggers a Lambda that extracts files. Step Functions distributed map fans out processing across Fargate tasks - each file gets its own container. The containers use the waitForTaskToken pattern - Step Functions passes a callback token as an environment variable, the Rust container processes the data, then calls send_task_success to signal completion.

Key details:

Fargate at minimum specs: 0.25 vCPU, 512 MB RAM
OpenTelemetry sidecar for CloudWatch metrics
Container images in ECR with multi-stage Docker builds
Written in Rust for the worker containers
Infrastructure managed with Terraform

Here is the Step Functions integration that launches Fargate tasks:

{
  "Type": "Task",
  "Resource": "arn:aws:states:::ecs:runTask.waitForTaskToken",
  "Parameters": {
    "LaunchType": "FARGATE",
    "Cluster": "${ecs_cluster}",
    "TaskDefinition": "${task_def_name}",
    "NetworkConfiguration": {
      "AwsvpcConfiguration": {
        "Subnets": ["${fargate_subnet}"],
        "SecurityGroups": ["${vpc_default_sg}"]
      }
    },
    "Overrides": {
      "ContainerOverrides": [{
        "Name": "store_data_processor_daily",
        "Environment": [
          {"Name": "TASK_TOKEN", "Value.$": "$$.Task.Token"},
          {"Name": "S3_BUCKET", "Value.$": "$.BatchInput.source_bucket_name"},
          {"Name": "S3_KEY", "Value.$": "$.Items[0].Key"}
        ]
      }]
    }
  }
}

This pattern works for any fan-out workload - ETL, media processing, report generation, ML batch inference. Each task is independent, starts in seconds, processes its data, and terminates. You pay only for the compute time used.

I also used this same Fargate + callback token pattern in my Serverless Pizza Ordering project, where the Fargate container simulated pizza preparation and delivery - chosen over Lambda because the "AI" insisted some pizzas could take more than 15 minutes.

Real-World Architecture: API Backend on Fargate

My Aurora DSQL Kabob Store project uses ECS Fargate as an always-on API backend.

React frontend behind CloudFront, Application Load Balancer routing to FastAPI containers on Fargate, connecting to Aurora DSQL for multi-region active-active writes.

The key design decision: keep business logic runtime-agnostic. The same FastAPI application uses direct psycopg2 queries (not ORM) so it can deploy across Fargate, ECS on EC2, Lambda, or EKS with minimal adapter code.

Fargate costs about 20-30% more than equivalent EC2 on-demand for sustained workloads, but the operational simplicity during development is worth it. In practice, real migrations from Fargate to EC2 often yield smaller savings than expected - Tines reported only ~5% compute cost savings after migrating, though they saw 30% faster job processing and 10% lower P95 latency from having dedicated hardware. The biggest cost was actually the VPC infrastructure - NAT gateways at ~$2-3/day - not ECS itself. I terraform destroy when not actively developing.

Terraform Examples

Here is the core Terraform for an ECS Fargate setup, taken from my Serverless Data Processor project:

Cluster

resource "aws_ecs_cluster" "ecs_cluster" {
  name = "${var.project_name}-cluster"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

Task Definition

resource "aws_ecs_task_definition" "fargate_processor_task" {
  family                   = var.task_definition_name
  execution_role_arn       = aws_iam_role.ecs_task_execution_role.arn
  task_role_arn            = aws_iam_role.ecs_task_role.arn
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = var.fargate_cpu
  memory                   = var.fargate_memory

  container_definitions = templatefile(
    "${path.module}/container-definitions.json.tpl",
    {
      app_image           = var.app_image
      fargate_cpu         = var.fargate_cpu
      fargate_memory      = var.fargate_memory
      aws_region          = var.aws_region
      project_name        = var.project_name
      task_container_name = var.task_container_name
    }
  )
}

Container Definition Template

[
  {
    "cpu": ${fargate_cpu},
    "essential": true,
    "image": "${app_image}",
    "memory": ${fargate_memory},
    "name": "${task_container_name}",
    "networkMode": "awsvpc",
    "environment": [
      {"name": "S3_BUCKET", "value": "my-bucket"},
      {"name": "S3_KEY", "value": "data/input.json"}
    ],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/ecs/${project_name}",
        "awslogs-region": "${aws_region}",
        "awslogs-stream-prefix": "${project_name}-log-stream"
      }
    }
  },
  {
    "image": "public.ecr.aws/aws-observability/aws-otel-collector:latest",
    "name": "aws-otel-collector",
    "essential": false,
    "command": ["--config=/etc/ecs/ecs-cloudwatch.yaml"],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/ecs/${project_name}-otel",
        "awslogs-region": "${aws_region}",
        "awslogs-stream-prefix": "otel"
      }
    }
  }
]

ECS Service with ALB (for always-on workloads)

resource "aws_ecs_service" "api_service" {
  name            = "${var.project_name}-service"
  cluster         = aws_ecs_cluster.ecs_cluster.id
  task_definition = aws_ecs_task_definition.api_task.arn
  desired_count   = var.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.api.arn
    container_name   = var.container_name
    container_port   = var.container_port
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  depends_on = [aws_lb_listener.api]
}

Three resources for the core ECS setup: cluster, task definition, and service. The container definition template handles the application specifics. The full Terraform for both projects is in the GitHub repos linked at the end.

Recent Features (2025-2026)

ECS has had a remarkable year of feature launches:

AI Developer Tools (December 2025) - ECS MCP Server for AI-assisted development and operations. Natural language commands for cluster management.
ECS Express Mode (November 2025) - Deploy a production-ready containerized web app with just three inputs: a container image, a task execution role, and an infrastructure role. Provisions Fargate, ALB with SSL, auto scaling, monitoring, and a unique URL. Up to 25 services can share one ALB. No additional charge beyond the underlying resources.
Canary and Linear Deployments (October 2025) - Fine-grained traffic shifting. Canary from 0.1% to 99.9%, linear in increments as small as 3%.
Service Connect Envoy Access Logs (October 2025) - Per-request telemetry for HTTP, HTTP2, gRPC, and TCP.
ECS Managed Instances (September 2025) - AWS-managed EC2 with Bottlerocket OS. Attribute-based instance selection for GPUs and specialized hardware.
Native Blue/Green Deployments (July 2025) - Built into ECS without CodeDeploy. Six Lambda lifecycle hooks for testing and approval at each phase. Configurable bake time for instant rollback.
Non-Blocking Log Driver Default (June 2025) - Prioritizes task availability over logging completeness.

ECS vs EKS - When to Use What

This is the most common question I get. Both solve the same fundamental problem - running containers reliably at scale.

Criteria	ECS	EKS
Control plane cost	Free	~$75/month per cluster
Learning curve	AWS-native concepts	Kubernetes concepts
AWS integration	Deep, native	Good, via add-ons
Multi-cloud	AWS only	Portable K8s manifests
Ecosystem	AWS tooling	Helm, ArgoCD, Istio, operators
Managed compute	Fargate, Managed Instances	Fargate, managed node groups

Choose ECS when your team is AWS-focused, you want operational simplicity, you value the free control plane, and your workloads are straightforward services and batch jobs.

Choose EKS when your team knows Kubernetes, you need multi-cloud portability, you want the Kubernetes ecosystem (Helm, ArgoCD, custom operators), or you're running complex stateful workloads.

Most organizations pick based on team expertise and existing tooling, not technical limitations.

ECS vs Lambda - Containers vs Functions

Another comparison that comes up frequently:

Criteria	ECS/Fargate	Lambda
Max duration	Unlimited	15 minutes
Max memory	120 GB	10 GB
Startup	Seconds (image pull)	Milliseconds (warm) to seconds (cold)
Pricing	Per-second (vCPU + memory)	Per-invocation + duration
Scaling	Service auto scaling	Automatic per-request
Best for	Long-running, resource-heavy, always-on	Event-driven, short-lived, bursty

In my projects I use both. Lambda for event handling - S3 triggers, API endpoints, file extraction. Fargate for heavy processing - data transformation, ML inference, container workloads that need full runtime control.

Best Practices

After building several production systems with ECS, here are the practices I've found most valuable:

Use Fargate unless you need GPUs or specific instance types. The operational simplicity is worth the cost premium for most workloads.
Use awsvpc networking mode everywhere. It's the only mode that works on all compute types and gives you per-task security groups.
Enable the deployment circuit breaker with rollback. This catches failed deployments before they impact all traffic.
Use capacity provider strategies to mix Spot and on-demand. A base of on-demand with weighted Spot gives you cost savings with a reliability floor.
Inject secrets via Secrets Manager. Never bake them into images or pass them as plain environment variables.
Enable Container Insights. The per-task metrics and honeycomb visualization are invaluable for debugging.
Use Service Connect for service-to-service communication. Free traffic metrics and managed Envoy proxies with no code changes.
Use VPC endpoints for ECR, S3, and CloudWatch. NAT gateway data transfer costs are the biggest hidden expense in ECS architectures.
Use multi-stage Docker builds. Keep images small. A Rust binary in a scratch image is a few megabytes. A Python app in a slim image with only production dependencies.
Define health checks in the task definition. Don't rely solely on ALB health checks - container-level health checks catch issues faster.

Pricing - You Pay for Compute, Not Orchestration

The most important thing to know: ECS orchestration is free. You only pay for the compute resources your containers consume.

Fargate - Per-second billing for vCPU ($0.04048/hour) and memory ($0.004445/GB/hour). Spot is up to 70% less. Compute Savings Plans can reduce costs by up to 49% (3-year all-upfront) or ~20% (1-year no-upfront).
EC2 - Standard instance pricing. Use Savings Plans or Reserved Instances for sustained workloads.
Managed Instances - EC2 instance pricing plus a management fee for automated provisioning, patching, and host replacement.

Hidden costs to watch:

NAT gateways - $0.045/GB for data processed. Use VPC endpoints.
ALB - Fixed hourly cost plus per-LCU. Up to 25 ECS Express Mode services can share one ALB.
ECR storage - $0.10/GB/month. Use lifecycle policies to clean up old images.
Ephemeral storage - Fargate charges $0.000111/GB/hour above the default 20 GiB.

Cost optimization strategies:

Right-sizing - The single biggest lever. Reducing from 1 vCPU/2GB to 0.5 vCPU/1GB can yield ~45-50% lower Fargate task cost.
Scheduled shutdowns - Running dev/staging environments only during business hours (8 hours/day, 5 days/week) can reduce costs by over 75%.
Savings Plans - Commit to consistent usage for 1-3 years. Even no-upfront 1-year plans save ~20% on Fargate.

For my projects, the ECS cost has been minimal. The batch processor runs tasks for seconds at minimum specs. The Kabob Store's main cost was VPC infrastructure ($2-3/day), not ECS.

Things to Know

A few operational details worth keeping in mind:

Task placement - Fargate handles placement automatically. For EC2, use the binpack placement strategy to consolidate workloads on fewer instances and reduce waste.
Task recycling - Fargate tasks on platform version 1.4.0+ are recycled after 14 days of continuous running. Your service will gradually replace old tasks.
ENI limits - In awsvpc mode on EC2, each task needs an ENI. Enable ENI trunking to increase density (requires CloudFormation custom resources).
Image pull time - Large images slow task startup. Keep images lean. Set ECS_IMAGE_PULL_BEHAVIOR=prefer-cached on EC2 instances to use cached images when available.
Spot instance draining - For EC2 Spot instances, set ECS_ENABLE_SPOT_INSTANCE_DRAINING=true on the ECS agent for graceful task termination.
Service quotas - Default Fargate vCPU quota is 6 on new accounts (up to 4,000 in production). Request increases proactively.
Force new deployment - If you update a secret or parameter store value, the running tasks won't pick it up automatically. Force a new deployment to refresh.

Wrapping Up

ECS is the container orchestration service I use most on AWS. The free control plane, deep AWS integration, and flexible compute options make it the right choice for most container workloads that don't require Kubernetes-specific tooling.

The recent feature launches have been particularly impressive - native blue/green without CodeDeploy, canary and linear deployments, Managed Instances for GPU workloads, and Express Mode for rapid prototyping. Combined with Fargate's serverless simplicity and Service Connect's built-in service mesh, ECS has matured into a comprehensive platform for running containers at any scale.

I've used it for batch data processing with Step Functions fan-out, pizza ordering with long-running container workflows, and multi-region API backends with Aurora DSQL. In every case, ECS handled the orchestration cleanly while I focused on the application logic.

If you're running containers on AWS and haven't looked at ECS recently, the current feature set is worth a fresh evaluation. Start with a Fargate service behind an ALB, enable Container Insights, and go from there.

Resources

Amazon ECS Documentation
Amazon ECS Best Practices Guide
Serverless Data Processor - Step Functions + Fargate - Batch processing with ECS
Serverless Pizza Ordering - Long-running Fargate container workflows
Aurora DSQL Kabob Store - FastAPI on Fargate with multi-region database
Step Functions + Fargate GitHub Repo - Full Terraform and Rust container code
Serverless Pizza GitHub Repo - Full Terraform and container code
AWS London Ontario User Group - Meetups, talks, and community for AWS builders in the London, Ontario area
AWS London Ontario User Group YouTube - Recorded talks and presentations

Connect with me on X, Bluesky, LinkedIn, GitHub, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

Amazon EventBridge: The Event-Driven Backbone of AWS (And My Favourite Service)

Darryl Ruggles — Fri, 13 Mar 2026 01:29:07 +0000

I have been building serverless applications on AWS for years now, and if you asked me to pick one service that keeps showing up in almost every project I build, it would be Amazon EventBridge without hesitation. EventBridge is my favourite AWS service. It offers rules for reacting to events, pipes for bridging data across AWS services, a scheduler for time-based triggers, and a schema registry for discoverability - all fully serverless and practically free.

I recently had the opportunity to present on EventBridge to the AWS Cloud Club at Amity University Chhattisgarh, and I decided to consolidate key details about this service into one place. This blog post is the companion to that presentation - a dive into why EventBridge matters, how I use it, and how you can start building with it today.

Why Event-Driven Architecture?

Before we talk about EventBridge specifically, let's talk about why event-driven architecture (EDA) is worth adopting. Four core principles make EDA compelling:

Loose coupling - Producers emit events without knowing who consumes them. You can add new consumers without touching existing code.
Real-time reactivity - Respond to changes as they happen, instead of polling for updates.
Scalability by default - Event buses handle millions of events. Consumers scale independently.
Extensibility - New capabilities just subscribe to existing events. No rewiring required.

EventBridge is purpose-built to make this simple, scalable, and serverless on AWS.

The pattern is straightforward. On the left, producers - S3, IAM, CloudWatch, your applications - emit events onto EventBridge. In the middle, EventBridge routes them via rules and content-based patterns. On the right, consumers - Lambda, SNS, SQS, Step Functions - each subscribe independently. Producers don't know about consumers. Consumers don't know about producers. EventBridge is the decoupling layer.

What is Amazon EventBridge?

EventBridge is a serverless event bus that helps you receive, filter, transform, route, and deliver events. It has four major capabilities:

Event Buses

Default Event Bus - Every AWS account gets one automatically. It receives virtually all mutating API calls via CloudTrail - anything that creates, modifies, or deletes something across 250+ AWS services. Events flow whether you are listening or not. And it is completely free.
Custom Event Buses - Create your own buses and publish custom events. Cross-account sharing via resource policies. As of January 2025, EventBridge also supports direct cross-account delivery to targets like SQS, Lambda, SNS, Kinesis, and API Gateway - no intermediary event bus required in the target account.
SaaS Partner Buses - Partners like Datadog and PagerDuty can send events directly into your bus.

Rules & Event Patterns

Rules match events and route them to targets. You can have up to 5 targets per rule. The pattern matching is remarkably flexible:

Prefix/suffix matching - Match on the beginning or end of string values
Anything-but - Exclude specific values
Numeric ranges - Filter by greater-than, less-than, or between
Exists - Check for the presence or absence of a field
OR logic - Arrays match any value in the list

Matching happens at the EventBridge level - you only pay for events that actually match your rules.

Scheduler

Three types of schedules: rate-based, cron expressions, and one-time schedules. Supports millions of schedules with integrations across 270+ AWS services and over 6,000 API operations. Includes flexible time windows, automatic retries, and dead-letter queues. EventBridge Scheduler replaces the older scheduled rules and is significantly more capable.

Pipes

Pipes provide point-to-point integration with a pipeline architecture: Source → Filter → Enrich → Transform → Target. They replace the "glue Lambda" functions that many of us have written to connect services. Filtered events are free.

For example, a DynamoDB stream can flow into Pipes, filter for only INSERT events, enrich the data with a Lambda function, and deliver to an SQS queue - all without writing a custom Lambda to glue the pieces together.

Real-World Pattern: Security Monitoring with the Account Watcher

The best way to understand EventBridge is to see it in action. My Serverless AWS Account Watcher project is a perfect example. The challenge was simple: keeping track of all the actions being done in your AWS account can be a real challenge, especially from a security perspective. I needed to know immediately when someone deleted an S3 bucket, created an IAM access key, or logged into the console.

The architecture relies entirely on the default event bus. CloudTrail captures API activity - S3 deletions, IAM changes, console logins - and delivers those events to EventBridge automatically. Rules match specific event patterns and trigger a Lambda function that formats notifications and fans them out to SNS (for email) and a Slack webhook.

Here is what the SAM template looks like for the EventBridge rule:

Events:
  S3EventsRule:
    Type: EventBridgeRule
    Properties:
      Pattern:
        source:
          - aws.s3
        detail-type:
          - "AWS API Call via CloudTrail"
        detail:
          eventName:
            - DeleteBucket
            - DeleteBucketPolicy
            - PutBucketPolicy

That is it. A pattern and a source. SAM handles the rule creation, permissions, and target wiring. The Lambda handler uses Python's match/case to route different event types:

match event_name:
    case "DeleteBucket":
        message = f"S3 bucket '{bucket_name}' was deleted by {user}"
    case "CreateAccessKey":
        message = f"New access key created for {target_user} by {user}"
    case "ConsoleLogin":
        message = f"Console login by {user} from {source_ip}"

The last two lines of the handler are the entire fan-out - send_slack_message() and publish_to_sns(). Two notification channels, completely decoupled from the event detection logic.

The extensibility story is powerful here. Want to monitor a new API action? Add one line to the event pattern. Want to add a Microsoft Teams notification? Add one more function call. The event detection, routing, and notification logic are all independent.

I wrote about this project in detail in my Serverless AWS Account Watcher blog post - the full SAM template, Lambda code, and deployment instructions are all there.

Real-World Pattern: Health Check Alerts

My Serverless Site Health Check Notification System takes the same EventBridge pattern in a different direction. Route53 health checks monitor endpoints globally via HTTP, HTTPS, or TCP. When a health check fails, CloudWatch transitions an alarm state, and that state change flows through the default event bus.

I took advantage of the suffix matching. The EventBridge rule matches alarm names ending with -HealthCheckAlarm:

{
  "source": ["aws.cloudwatch"],
  "detail-type": ["CloudWatch Alarm State Change"],
  "detail": {
    "alarmName": [{ "suffix": "-HealthCheckAlarm" }]
  }
}

This means you can add new Route53 health checks without ever updating the EventBridge rule or redeploying your Lambda. Name the CloudWatch alarm with the -HealthCheckAlarm suffix, and it automatically triggers the existing notification pipeline. This is the extensibility promise of EDA in practice - new health checks, no code changes.

Real-World Pattern: Event-Driven Sales Analytics

In my Serverless Sales Analytics Platform, EventBridge plays a dual role. First, S3 object creation events flow through the default bus - when a store uploads daily sales data, an EventBridge rule matches the bucket and prefix, triggering a Step Functions workflow to transform and analyze the data. Second, an EventBridge Scheduler rule runs at 11 PM daily as a fallback, ensuring the analysis pipeline executes even if stores have not reported.

This combination of event-driven triggers and scheduled fallbacks is a pattern I use frequently. EventBridge handles both seamlessly - reactive processing when data arrives, and scheduled processing as a safety net.

EventBridge vs SNS vs SQS - When to Use What

This is one of the most common questions I get. All three are messaging services, but they serve different purposes:

Feature	EventBridge	SNS	SQS
What it does	Routes events based on content	Broadcasts to subscribers	Queues messages for consumers to pull
Filtering	Prefix, suffix, numeric, anything-but, exists	Basic attribute filters	None
AWS service events	250+ services on default bus, free	You wire it up yourself	You wire it up yourself
Schema discovery	Built-in Schema Registry	No	No
Targets	20+ supported target types	Lambda, SQS, HTTP, email subscriptions	Consumer pulls at their own pace
Ordering	Best-effort	Best-effort	FIFO available

Use EventBridge for intelligent content-based routing and reacting to AWS service events. Use SNS for simple pub/sub fan-out when you already know the event structure. Use SQS for guaranteed message processing with consumer-controlled throughput.

In practice, they work together. My Account Watcher uses EventBridge for routing and SNS for email delivery - each service doing what it does best.

EventBridge vs Kafka

Another comparison that comes up: Kafka is a distributed log for high-throughput streaming with replay capabilities. EventBridge is a managed event router for AWS-native architectures. They are fundamentally different tools. Kafka excels at sustained high-throughput data streaming (millions of messages per second). EventBridge excels at reactive event routing with content-based filtering and zero infrastructure management.

They can also work together - EventBridge Pipes can consume directly from Kafka topics.

Best Practices

After building several production systems with EventBridge, here are the practices I have found most valuable:

Use specific event patterns - The more precise your pattern, the fewer unnecessary Lambda invocations you pay for.
Implement dead-letter queues on rule targets - When EventBridge cannot deliver to a target, the DLQ captures the event for investigation. Use Lambda Destinations OnFailure for richer error context.
Use input transformers - Shape the event payload before it reaches your target. Reduce parsing logic in your Lambda functions.
Separate buses for different domains - Custom buses keep application events isolated from the AWS service events on the default bus.
Prefer Scheduler over scheduled rules - EventBridge Scheduler is the newer, more capable replacement. Use it for all new time-based triggers.
Archive and Replay for debugging - Enable archiving so you can replay events after fixing a bug. This is invaluable in production.
Implement idempotency - EventBridge provides at-least-once delivery for most targets. Your consumers should handle duplicate events gracefully.
Use SAM, Terraform, or CDK for infrastructure - Define your rules, patterns, and targets as code. The SAM EventBridgeRule event type is incredibly concise.

Pricing - Practically Free

This might be the most surprising part for people new to EventBridge:

Default event bus: Completely free. Every AWS service event that flows through it costs you nothing.
Custom events: $1.00 per million events. Each 64 KB chunk of a payload counts as one event.
Scheduler: $1.00 per million invocations, with a generous free tier of 14 million invocations per month.
Pipes: $0.40 per million requests after filtering. Filtered-out events are not charged.

For my projects - Account Watcher, Health Check Alerts, Sales Analytics - the EventBridge cost has been essentially $0. The Lambda invocations and SNS messages that EventBridge triggers are also well within free tier for low-to-moderate volume alerting.

Things to Know

A few operational details worth keeping in mind:

At-least-once delivery - EventBridge delivers events at least once for most targets, and AWS service events are delivered with either "best effort" or "durable" guarantees depending on the service. Design your consumers to be idempotent.
5 targets per rule - Need more? Create additional rules with the same pattern.
1 MB event size limit - As of January 2026, EventBridge increased the payload limit from 256 KB to 1 MB. For larger payloads, put the data in S3 and pass the reference in the event. Note that CloudTrail events are still capped at 256 KB.
Sub-second latency - EventBridge routes events quickly, but CloudTrail-sourced events have a 1–5 minute delay from the original API call.
No ordering guarantee - The default bus does not guarantee event order. Handle this in your consumer logic.

The EDA Trade-offs - Being Honest

I believe in being transparent about trade-offs rather than only presenting the benefits. Event-driven architecture is not universally better than request/response - it is a different set of trade-offs.

Debugging is harder. In request/response, you get a stack trace from top to bottom. In EDA, an event flows through multiple decoupled services - when something fails, you are piecing together a distributed trace across Lambda logs, EventBridge metrics, and DLQ messages.

Eventual consistency, not immediate. If a user creates a resource and your event handler has not processed yet, a subsequent read might show stale data. You need to design your UX to tolerate this lag.

Testing is more complex. You cannot just call an endpoint and check the response. You need to verify that events were emitted, rules matched, targets invoked, and downstream side effects occurred.

"Invisible" coupling. EDA is loosely coupled at the code level, but you still have coupling on the event schema. If a producer changes their event format, consumers break silently. Schema Registry helps, but it requires discipline.

The good framing: use request/response when you need synchronous answers and simple debugging. Use EDA when you need loose coupling, scalability, extensibility, and real-time reactivity. Most real systems use both.

Recent Updates Worth Knowing

EventBridge continues to evolve. Here are the most notable changes from the past year:

Scheduler quota monitoring (Feb 2026) - EventBridge Scheduler now emits resource count metrics to CloudWatch, so you can monitor how close you are to quota limits and request increases proactively.
1 MB event payload (Jan 2026) - The event size limit increased from 256 KB to 1 MB, eliminating the need for data chunking or S3 offloading in most cases. Particularly useful for LLM prompts, telemetry signals, and complex ML output payloads.
Enhanced visual rule builder (Nov 2025) - The console now includes an intuitive visual rule builder with a comprehensive event catalog and schema-aware drag-and-drop canvas. Makes it much easier to discover available events from 250+ AWS services and build patterns without referencing individual service docs.
SQS fair queue targets (Nov 2025) - EventBridge can now target SQS fair queues, enabling fairer message distribution across consumer groups in multi-tenant systems.

Wrapping Up

EventBridge sits at the center of event-driven architecture on AWS. The default event bus gives you free, automatic visibility into virtually every mutating API call across your account. Rules and patterns give you precise, content-based routing. Scheduler replaces cron jobs with a fully managed, scalable solution. Pipes eliminate glue code for point-to-point integrations.

I have used it for security monitoring, health check alerting, and data pipeline orchestration - and every time, EventBridge has been the simplest, most cost-effective piece of the architecture. It is the service I reach for first when I need to connect systems, react to changes, or automate workflows. If you are building on AWS and have not explored EventBridge yet, start with the default event bus. Write one rule. Match one event. You will be hooked.

If you are interested in going deeper, I recently presented this material to the AWS Cloud Club at Amity University Chhattisgarh.

Resources

Amazon EventBridge Documentation
Serverless AWS Account Watcher - My blog post on security monitoring with EventBridge
Serverless Site Health Check Notification System - Health check alerting with EventBridge suffix matching
Building a Serverless Sales Analytics Platform - EventBridge event routing + scheduled triggers
Account Watcher GitHub Repo
Health Check GitHub Repo
AWS Cloud Club at Amity University Chhattisgarh - EventBridge Presentation
AWS London Ontario User Group - Meetups, talks, and community for AWS builders in the London, Ontario area
AWS London Ontario User Group YouTube - Recorded talks and presentations

Connect with me on X, Bluesky, LinkedIn, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

Building a Serverless Blogging Platform with AWS and Claude Code

Darryl Ruggles — Wed, 11 Mar 2026 01:10:10 +0000

How I replaced Hashnode with a custom-built, full-featured blogging platform in under 10 hours - with an AI pair programmer doing the heavy lifting.

You can see the new site here -> Darryl's World of Cloud.

Why I Left My Old Platform

For the past couple of years, I hosted my blog on Hashnode with a Vercel-backed custom domain. It worked. Articles rendered fine, the editor was decent, and I didn't have to think about infrastructure. But over time, the limitations started adding up.

Newsletters were the first frustration. Setting up a mailing list, customizing email templates, and scheduling sends all required workarounds or third-party integrations. Email handling felt like an afterthought bolted onto a developer blogging platform. I wanted to send article roundups, curated link collections, and freeform newsletters - all from the same place I write articles. There wasn't an easy way to make all that work there.

Then there was the visibility problem. I am a cloud solutions architect. I work with AWS infrastructure every day. Yet my own blog was a black box. I couldn't see the CDN configuration, couldn't tune caching behavior, couldn't add custom security headers or WAF rules. The hosting details were hidden behind an abstraction I didn't control.

I also wanted to learn. I had been working with several AWS services professionally but hadn't built a complete, production-facing application for my own use that stitched them all together - CloudFront distributions, Lambda function URLs, DynamoDB single-table design, SES email delivery, Cognito authentication. Building my own platform and managing it over time is the best way to learn these services deeply and have something concrete to write about.

So I decided to build it myself. Not just migrate the content - build the entire platform from scratch. The editor, the publishing pipeline, the newsletter system, the analytics, the infrastructure. All of it. The question was whether I could do it fast enough to make the effort worthwhile.

The final push to actually work on this project came when I read a recent article from Ran Isenberg titled Claude Built My Wix Website in 3 Hours - Is SaaS Dead?, where he went through a similar process and I knew it was time to start myself. I have learned so much from all of his work and am grateful for the motivation here.

What I Built - A Modern, Full-Featured Platform

The result is a modern, visually clean blogging platform that loads fast. Every public page is pre-rendered static HTML served from CloudFront edge locations worldwide. No server-side rendering, no client-side hydration, no JavaScript frameworks on the public site - just HTML and CSS delivered from the nearest edge location. Page loads are sub-second globally.

Compared to what I had before, the design is more polished and the performance is noticeably snappier. I went with a clean, card-based homepage layout with a hero section, tag-based navigation, and a reading experience that stays out of your way. The typography, spacing, and code syntax highlighting are all tuned exactly how I want them - because I own every pixel. And if I ever want to change anything it's only a few minutes away from being live.

But the real wins are in the features I could never get working the way I wanted on a hosted platform.

Newsletter Infrastructure Done Right

I haven't sent out any newsletters publicly as of yet but I have done lots of testing and iterating on the format and the infrastructure is now in place. The newsletter system supports three distinct types. Freeform newsletters use the same markdown editor as articles - write whatever you want and send it. Article roundups let you select published articles and automatically generate card-based layouts with excerpts and links. Curated link collections pull in external URLs, auto-fetch their metadata (title, description, image) via OpenGraph tags with Bedrock AI as a fallback, and render them as rich link cards.

All three types support scheduling, email preview before send, and are archived as browsable pages on the public site. Everything runs through Amazon SES - no Mailchimp, no SendGrid, no third-party email service.

Article Preview Before Publishing

Time-limited private URLs let you see exactly how a draft will look on the live site before making it public with the same template, same styles, and same layout as the real articles have. Each preview link expires in a short time, so you can share drafts with reviewers without worrying about stale links lingering.

Comments with Reader Registration

Readers sign up to be able to add article comments. They can verify their email and once approved by me they can comment on posts. The admin side has a full moderation queue - approve, reject, or ban users. No third-party comment widgets are used or injecting tracking scripts into your site.

Email Subscriptions and Notifications

Readers subscribe with double opt-in email verification, manage their preferences, and get notified automatically when new articles are published. One-click unsubscribe in every email. All of it is built on SES and DynamoDB.

Custom Analytics

Built-in page view tracking with zero third-party scripts. I can see referral sources, geographic distribution, device types, and more about who browses articles on the site - with hourly, daily, and monthly granularity. I would have had to pay a lot more per month to see this with Vercel but here it's almost free.

Developer Tooling and Infrastructure

A WYSIWYG markdown editor with source toggle, AI-powered SEO title and description generation via Amazon Bedrock, drag-and-drop image upload to S3 with pre-signed URLs, automated backups with cross-account disaster recovery, and CI/CD with GitHub Actions deploying on every push to main.

The platform runs on 16+ AWS services including CloudFront, Lambda, DynamoDB, S3, SES, Cognito, and more with automated backups and lots of security controls. My favorite Infrastructure as Code (IaC) tool Terraform is used. My monthly cost is in the single dollars due to using serverless/managed services with pay-per-use pricing.

The AI-Assisted Development Experience

I have worked in the AI/ML space for a number of years now. I have seen lots of promising ideas but haven't been completely sold on most of the tools from this space being super useful or becoming real difference makers in my day-to-day life. I have been experimenting with AI-based coding tools for over a year now with lots of hope. I was skeptical and not super impressed with what they could do a year or more ago. I had tried GitHub Copilot back then and found the autocomplete suggestions useful but hardly earth shattering. I experimented with Cursor, Kiro, OpenCode, and a few others. The early outputs were rough - code that looked plausible but missed edge cases, made incorrect API calls, or ignored the architectural context of the project.

But the improvement over the last number of months has been dramatic. I settled on Claude Code with Opus 4.5/4.6 as my primary development tool, and the workflow it enables is genuinely different from anything I have experienced in 25+ years of software development.

The Workflow

The pattern is simple: spend a few minutes describing what you want in English, and within minutes the feature is working. Not hours of scaffolding and boilerplate - minutes. Need a newsletter scheduling system with EventBridge? Describe the requirements, review the generated Terraform and Lambda code, test it, push it. Need a comment moderation queue with Cognito integration? Same thing. Of course there are caveats and things to seriously keep an eye on - more on those below.

The key is that you are not just generating code snippets. Claude Code understands the full project context - the file structure, the existing patterns, the infrastructure configuration, the naming conventions. When I asked it to add newsletter support, it knew where the API routes lived, how the DynamoDB tables were structured, what the Terraform modules looked like, and how the admin SPA was organized. It generated coordinated changes across a dozen files that all fit together coherently.

This is what sets it apart from autocomplete-style tools. The context window spans the entire project. It reads your Terraform state, your Lambda handlers, your React components, and your CLAUDE.md instructions. The result is code that feels like it was written by someone who has been on the project for months, not a tool that just saw a single file.

MCP Servers Extended the Capabilities

One of the features that made Claude Code especially effective for most projects is Model Context Protocol (MCP) servers. These are plugins that give the AI access to specialized tools and documentation right in the development flow. Some of the ones I use a lot are described here.

The Terraform MCP server provides instant access to AWS provider documentation, Checkov security scanning, and module search. Instead of tab-switching to the Terraform docs, Claude Code looks up resource attributes, checks for security misconfigurations, and finds community modules - all inline during development.

The AWS Knowledge MCP server gives inline access to AWS documentation. When I needed to understand CloudFront OAC signing behavior or SES configuration set options, the documentation was available without leaving the editor.

The AWS Serverless MCP server provides Lambda patterns, SAM guidance, and event source mapping configuration. Useful for getting the Lambda function URL streaming configuration right and understanding best practices.

The AWS Diagram MCP server generates architecture diagrams directly from code, producing decent diagrams with little effort. I have to say the diagram server still has room for improvement as there are too many overlapping lines and it tends to generate very vertical diagrams.

The Timeline

Looking at the git history, the build of my new site happened by working on it for a few hours a day over five calendar days:

Day 1: Core platform from scratch plus Hashnode migration - a couple of hours to get articles rendering, the admin editor working, and all existing content imported
Day 2: Comments, email subscriptions, CloudWatch monitoring, Slack alerts, CI/CD pipeline, and automated testing
Day 3: Analytics refinements, backup setup, custom domain setup
Day 4: Full newsletter system - three newsletter types, scheduling, preview, archive pages
Day 5: Article preview with random URLs, firewall tuning, polish and bug fixes

The final count: 70 commits, 11+ major features in five days. Total hands-on development time: under 10 hours for a full-featured, production-quality blogging platform. Monthly cost: well under $10 - serverless pay-per-use means you only pay for what you use, and a personal blog's traffic keeps costs negligible.

Honest Assessment - Not Magic, But Powerful

I want to be clear about something: Claude Code makes mistakes. Often. It's not a magic box that produces perfect code on the first try. It's not a tool to hand to your New Grad developer and expect them to produce quality code that's secure and meets the design specs. You still need experienced developers to guide the process and push back when things are going down the wrong path.

It will occasionally hallucinate API parameters that do not exist. It will sometimes choose an architectural pattern that's technically correct but wrong for the specific context. It will miss edge cases that an experienced developer would catch immediately. It will sometimes try to solve a problem by adding complexity when the right answer is to simplify. It will use the wrong versions of tools and many other bad things.

But here is the thing - an experienced developer who knows the services, languages, and patterns can catch these mistakes and push back. The AI responds well to correction. You say "that parameter does not exist on this resource," and it fixes it. You say "this should be async, not synchronous," and it restructures the code. The collaboration works because the human brings judgment and the AI brings speed.

Where Human Judgment Was Critical

Several times during the build, my experience was the difference between a working system and a broken one.

Security architecture decisions required human judgment at every turn. IAM least privilege policies, backup tool hardening, encryption key rotation, SES sending authorization - these are areas where "works on my machine" is not good enough. Each security decision needed careful review against best practices.

Synchronous vs. asynchronous patterns were a recurring design choice. When should code work synchronously vs. asynchronously? When should newsletter sends be immediate vs. scheduled through EventBridge? These are architectural decisions that depend on user experience requirements and operational constraints, not just technical feasibility.

The Review Layer

Code generated by AI must be scrutinized with the same rigor as code written by a junior developer - maybe more. For this project, every pull request went through GitHub Copilot and Amazon Q Developer code reviews in addition to my own review. This layered approach caught issues that any single reviewer might miss.

For anything production-facing, especially with security implications, there's no substitute for thorough review, penetration testing, and proper validation. AI-assisted development is fast, but speed without scrutiny is a liability.

Key Takeaways

AI-assisted development is real and practical today. Not for every task, and not without supervision, but the productivity gains are substantial for developers who know their domain.
MCP servers are a force multiplier. Having documentation, security scanning, and infrastructure tools available inline eliminates context switching and keeps you in flow.
Serverless is ideal for personal projects. Pay-per-use pricing means your blog costs pennies when nobody is reading it and scales automatically when a post hits the front page.
The human in the loop matters more than the AI. Domain expertise, security awareness, and architectural judgment are what turn AI-generated code into production-quality software.
Build things you will actually use. The best way to learn anything is to build something real that you will maintain and improve over time.

Closing Thoughts

I was lucky enough to be at AWS re:Invent 2025 in Las Vegas and spoke with many people on the topic of AI technology and the impact on our industry. There is a lot to be concerned about but there is so much promise. I was in the room for the final keynote from Werner Vogels and was truly inspired by his talk about the The Dawn of the Renaissance Developer.

I haven't had near this much fun coding in years. There is something deeply satisfying about describing a feature in plain English, watching it materialize in code, reviewing and refining it, and then seeing it live on your own infrastructure minutes later. The feedback loop is extraordinarily tight.

My backlog of features to build grows faster than I can ship them. Dark mode, reading time estimates, related articles, full-text search, RSS improvements, an about page redesign - the list keeps growing. Every time I use the platform to write a post, I notice something I want to improve. That's a good sign. It means the platform is useful enough to invest in, and I care enough about it to keep iterating.

If you have been putting off a side project because you thought it would take too long, the barrier has never been lower. A working prototype of almost anything is now a weekend away. The tools are that good - if you bring the expertise to guide them.

This platform is live and serving the very post you are reading right now. If something looks off, that's on me - and probably on my backlog already.

IMPORTANT: Remember to always validate security practices for anything public-facing - especially anything for production use. AI-assisted development makes it easy to move fast, and moving fast without security review is how breaches happen. Take the time to review IAM policies, test authentication flows, scan for vulnerabilities, and audit your infrastructure. The speed gains from AI should buy you more time for security - not less.

Connect with me on X, Bluesky, LinkedIn, Medium, Dev.to, GitHub, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

Bi-directional Voice-Controlled Recipe Assistant with Nova Sonic 2

Darryl Ruggles — Sun, 08 Mar 2026 01:19:36 +0000

What if your recipe assistant could talk back?

I have been building a serverless Family Recipe Assistant that searches my family's recipe collection, calculates nutrition from USDA data, and handles multi-turn conversations through a web UI. It works well. But every time I am in the kitchen with flour on my hands, reaching for my phone to type "how long do I bake the banana bread?" feels wrong.

I wanted to just ask.

The text-based assistant already had a "cooking mode" that read recipes aloud using Amazon Polly. But listening to a long recipe read start-to-finish by a TTS voice is surprisingly tedious - you cannot ask it to slow down, skip ahead, or clarify a step without going back to the screen and typing. What I really wanted was a conversation: "What is the next step?" or "How much butter was that again?" while my hands are covered in dough.

Amazon Nova Sonic v2 launched recently with sub-700ms speech-to-speech latency and a 1M token context window. The Strands Agents SDK added experimental support for bidirectional streaming through a BidiAgent class that wraps the WebSocket complexity into something remarkably simple. I decided to wire the two together and add my existing recipe tools - search, timers, nutrition lookup, and unit conversion - to build a voice-controlled kitchen assistant.

This post walks through how I built it, what worked, and what surprised me. It works great on my laptop and even on my Android phone and IPad. The complete code is on GitHub: github.com/RDarrylR/serverless-family-recipes-bidirectional-nova-sonic.

What is Bidirectional Streaming?

Traditional voice assistants follow a rigid pattern: listen, transcribe, think, generate text, synthesize speech, play audio. Each step waits for the previous one to finish. The result is a noticeable pause between your question and the response.

Bidirectional streaming changes this. The agent maintains a persistent WebSocket connection where audio flows in both directions simultaneously. Nova Sonic v2 handles speech understanding and generation in a single model - there is no separate transcription or TTS step. The practical effect is that the agent starts responding while you are still finishing your sentence, and you can interrupt it mid-answer to course-correct.

If you have used a native WebSocket implementation before, you know this means managing connection lifecycle, audio encoding, event multiplexing, and error recovery. That is roughly 150 lines of plumbing code before you write any business logic.

Strands BidiAgent reduces this to about 20 lines.

Architecture

The system has two layers. A React frontend in the browser captures microphone audio using the Web Audio API (with built-in echo cancellation - no headset needed) and sends it over a WebSocket to a FastAPI server. The server runs a Strands BidiAgent that forwards audio to Nova Sonic v2, dispatches tool calls, and streams response audio back to the browser for playback.

The components:

BidiAgent - The Strands orchestrator. Manages the streaming session, routes audio to the model, and dispatches tool calls.
BidiNovaSonicModel - The model adapter. Establishes a bidirectional WebSocket to Bedrock and handles the Nova Sonic protocol.
Kitchen tools - Four @tool decorated Python functions that the agent can call during conversation without interrupting the audio stream.
Bedrock Knowledge Base - My existing recipe collection, indexed with Titan Embed V2 and stored in S3 Vectors.
USDA FoodData Central - Public API for nutrition data.

The key insight is that tool execution happens concurrently with audio streaming. When you ask "find me a pasta recipe," the agent calls the search_recipes tool while continuing to listen for follow-up input. No blocking, no silence gap.

Prerequisites

Before you start, you will need:

Python 3.13+ (3.12 minimum for Nova Sonic)
Node.js 18+ (for the Vite frontend dev server)
An AWS account with Bedrock model access enabled for Nova Sonic v2
PortAudio system library (brew install portaudio on macOS) - required as a transitive dependency of the Strands SDK
uv for Python dependency management

git clone https://github.com/RDarrylR/serverless-family-recipes-bidirectional-nova-sonic.git
cd serverless-family-recipes-bidirectional-nova-sonic
uv sync
make install-frontend

Building the Agent

The server is a FastAPI WebSocket endpoint that bridges browser audio to BidiAgent. The key insight is that BidiAgent accepts plain callables for I/O. At its simplest, you can pass ws.receive_json and ws.send_json directly. In practice, I wrap them with thin functions for message size limits and cost tracking, but the pattern stays the same - no custom I/O classes needed.

from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from strands.experimental.bidi import BidiAgent
from strands.experimental.bidi.models import BidiNovaSonicModel
from strands.experimental.bidi.tools import stop_conversation

from config import AWS_REGION, NOVA_SONIC_VOICE, SYSTEM_PROMPT
from tools import search_recipes, set_timer, nutrition_lookup, convert_units

app = FastAPI()

sonic_model = BidiNovaSonicModel(
    region_name=AWS_REGION,
    provider_config={
        "audio": {
            "input_rate": 16000,
            "output_rate": 24000,
            "voice": NOVA_SONIC_VOICE,
        }
    },
)

@app.websocket("/ws")
async def websocket_endpoint(ws: WebSocket):
    agent = BidiAgent(
        model=sonic_model,
        tools=[search_recipes, set_timer, nutrition_lookup, convert_units, stop_conversation],
        system_prompt=SYSTEM_PROMPT,
    )

    try:
        await ws.accept()
        await agent.run(
            inputs=[ws.receive_json],
            outputs=[ws.send_json],
        )
    except WebSocketDisconnect:
        logger.info("Client disconnected")
    finally:
        await agent.stop()

That is the core of the agent. A new BidiAgent is created per connection - no shared state between sessions. The BidiNovaSonicModel handles the WebSocket connection to Bedrock. The agent.run() method ties them together and runs until the user says "stop" (triggering the stop_conversation tool) or the browser disconnects.

The provider_config sets the audio sample rates (16kHz input from the browser, 24kHz output from Nova Sonic) and the voice. Nova Sonic v2 supports several voices - I went with "tiffany" for a natural conversational tone, but you can also use "amy" or "puck".

Important: Voice IDs must be lowercase. Passing "Tiffany" instead of "tiffany" results in a ValidationException: Received invalid id error that silently kills the session. This is not documented anywhere obvious - I discovered it by enabling debug logging on the Strands Nova Sonic model class.

Why `stop_conversation`?

The stop_conversation tool is a built-in Strands tool that gracefully shuts down the streaming session. Without it, the only way to stop the agent is Ctrl+C, which does not cleanly close the WebSocket. The system prompt tells the agent to use this tool when users say "goodbye" or "stop."

Adding Kitchen Tools

Tools work identically to standard Strands agents. You decorate a function with @tool, add a docstring that describes when to use it, and pass it to the agent. The model decides when to call tools based on the conversation.

Recipe Search

This tool connects to my existing Bedrock Knowledge Base, which contains my family's recipe collection indexed with Titan Embed V2.

import boto3
from strands import tool
from config import AWS_REGION, BEDROCK_KB_ID

bedrock_agent_runtime = boto3.client("bedrock-agent-runtime", region_name=AWS_REGION)

@tool
def search_recipes(query: str) -> str:
    """Search the recipe knowledge base for recipes matching the query.

    Use this tool whenever a user asks about recipes, ingredients, or cooking methods.

    Args:
        query: Natural language search query about recipes
    """
    response = bedrock_agent_runtime.retrieve(
        knowledgeBaseId=BEDROCK_KB_ID,
        retrievalQuery={"text": query},
        retrievalConfiguration={
            "vectorSearchConfiguration": {"numberOfResults": 3}
        },
    )

    results = response.get("retrievalResults", [])
    if not results:
        return "No recipes found matching that query."

    # Group chunks by source file so multi-chunk recipes are reassembled
    source_chunks = {}
    for r in results:
        score = r.get("score", 0)
        if score < 0.3:
            continue
        text = r.get("content", {}).get("text", "").strip()
        source = r.get("location", {}).get("s3Location", {}).get("uri", "")
        source_name = source.split("/")[-1].replace(".md", "") if source else "unknown"

        if source not in source_chunks:
            source_chunks[source] = {"name": source_name, "score": score, "texts": []}
        source_chunks[source]["texts"].append(text)

    chunks = []
    for entry in source_chunks.values():
        merged = "\n\n".join(entry["texts"])
        chunks.append(f"Recipe: {entry['name']} (score: {entry['score']:.2f})\n{merged}")

    return f"Found {len(source_chunks)} recipe(s):\n\n" + "\n\n".join(chunks)

The chunk-merging logic is important. Bedrock KB returns chunks, not complete documents. A long recipe might be split across multiple chunks. By grouping chunks that share the same S3 source URI, the tool reassembles the full recipe text before handing it to the model.

The score threshold of 0.3 filters out weak matches. Without it, the agent would sometimes confidently describe a recipe that was only tangentially related to the query.

Cooking Timer

import asyncio
from strands import tool

@tool
def set_timer(minutes: int, label: str = "cooking") -> str:
    """Set a cooking timer for the specified number of minutes.

    Args:
        minutes: Number of minutes for the timer
        label: A short description of what the timer is for
    """
    if minutes <= 0:
        return "Timer must be at least 1 minute."

    loop = asyncio.get_running_loop()
    loop.create_task(_timer_callback(minutes, label))
    return f"Timer set: {label} for {minutes} minutes. I'll let you know when it's done."

async def _timer_callback(minutes: int, label: str):
    await asyncio.sleep(minutes * 60)
    print(f"\n** TIMER DONE: {label} ({minutes} minutes) **\n")

The timer runs as an asyncio background task. When it expires, it prints to the console. In a production version, this would trigger a spoken notification through the audio output - but since Nova Sonic sessions have an 8-minute limit, long timers outlive the session anyway. I will address this in the trade-offs section.

Nutrition Lookup

import requests
from strands import tool

@tool
def nutrition_lookup(food_item: str) -> str:
    """Look up nutrition information for a food item using USDA FoodData Central.

    Args:
        food_item: The food item to look up, e.g. "chicken breast" or "brown rice"
    """
    response = requests.get(
        "https://api.nal.usda.gov/fdc/v1/foods/search",
        params={"query": food_item, "pageSize": 1, "api_key": USDA_API_KEY},
        timeout=10,
    )

    foods = response.json().get("foods", [])
    if not foods:
        return f"No nutrition data found for '{food_item}'."

    food = foods[0]
    # Extract key nutrients from the USDA response
    nutrients = {}
    for n in food.get("foodNutrients", []):
        name = n.get("nutrientName", "")
        if name in TARGET_NUTRIENTS:
            nutrients[TARGET_NUTRIENTS[name]] = f"{n['value']} {n.get('unitName', '')}"

    return f"Nutrition for {food['description']} (per 100g):\n{json.dumps(nutrients, indent=2)}"

The USDA FoodData Central API is free and does not require registration for the DEMO_KEY. For production use, you should get a proper API key at https://fdc.nal.usda.gov/api-key-signup.

Unit Converter

from strands import tool

VOLUME_TO_ML = {
    "cup": 236.588, "tbsp": 14.787, "tsp": 4.929,
    "ml": 1.0, "liter": 1000.0, "fl oz": 29.5735,
}

WEIGHT_TO_G = {
    "g": 1.0, "kg": 1000.0, "oz": 28.3495, "lb": 453.592,
}

@tool
def convert_units(amount: float, from_unit: str, to_unit: str) -> str:
    """Convert between cooking measurement units.

    Args:
        amount: The numeric amount to convert
        from_unit: The unit to convert from
        to_unit: The unit to convert to
    """
    f, t = from_unit.lower(), to_unit.lower()

    # Temperature
    if f in ("f", "fahrenheit") and t in ("c", "celsius"):
        return f"{amount} F = {(amount - 32) * 5 / 9:.1f} C"

    # Volume
    if f in VOLUME_TO_ML and t in VOLUME_TO_ML:
        result = amount * VOLUME_TO_ML[f] / VOLUME_TO_ML[t]
        return f"{amount} {from_unit} = {result:.2f} {to_unit}"

    # Weight
    if f in WEIGHT_TO_G and t in WEIGHT_TO_G:
        result = amount * WEIGHT_TO_G[f] / WEIGHT_TO_G[t]
        return f"{amount} {from_unit} = {result:.2f} {to_unit}"

    return f"Cannot convert {from_unit} to {to_unit}"

This tool is intentionally simple. It handles volume, weight, and temperature - the three conversion types that come up most often while cooking. When someone asks to convert volume to weight (e.g., "how many grams is a cup of flour?"), the tool explains that the answer depends on the ingredient's density rather than guessing.

The Magic: Interruptions

The feature that makes bidirectional streaming feel genuinely different from a traditional voice assistant is interruption handling. You can talk over the agent mid-sentence, and it will stop, process your new input, and respond to that instead.

For example:

You: "Set a timer for 10 minutes"
Agent: "Timer set for 10 min-"
You: "Actually, make that 15"
Agent: "Got it, I have updated the timer to 15 minutes."

This works because the agent is always listening, even while generating output audio. When it detects that the user has started speaking, it cancels the current output and processes the interruption as new input. Strands handles the mechanics - clearing the output buffer, signaling the model to stop generating, and routing the new audio through the same pipeline.

From a developer perspective, you do not need to write any interruption logic. It just works.

Infrastructure

For local development, you only need IAM permissions for Bedrock access. The Terraform configuration creates an IAM role with least-privilege permissions:

resource "aws_iam_role_policy" "bedrock_access" {
  name = "bedrock-access"
  role = aws_iam_role.voice_agent.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "NovaSonicAccess"
        Effect = "Allow"
        Action = [
          "bedrock:InvokeModel",
          "bedrock:InvokeModelWithResponseStream",
          "bedrock:InvokeModelWithBidirectionalStream",
        ]
        Resource = [
          "arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-2-sonic-v1*",
        ]
      },
      {
        Sid    = "KnowledgeBaseAccess"
        Effect = "Allow"
        Action = ["bedrock:Retrieve"]
        Resource = [
          "arn:aws:bedrock:us-east-1:ACCOUNT:knowledge-base/YOUR_KB_ID",
        ]
      }
    ]
  })
}

Important: Model ID mismatch. The Bedrock foundation model ARN for Nova Sonic v2 is amazon.nova-2-sonic-v1:0 - not amazon.nova-sonic-v2 as you might guess from the marketing name. If your IAM policy uses the wrong model ID pattern, the BidiAgent will establish a connection to Nova Sonic and appear to be working, but the model silently fails to process audio. No error is returned - the agent simply produces no output. This cost me several hours of debugging. Check your IAM policy resource ARNs if the agent connects but never responds.

If you are just experimenting, your default AWS CLI credentials with Bedrock access will work without the Terraform setup. The IAM role becomes useful when you want to follow least-privilege principles or deploy to AgentCore.

Trade-offs and Gotchas

The 8-Minute Session Limit

Nova Sonic v2 sessions time out after 8 minutes. For a kitchen assistant, this is a real constraint - you might be cooking for an hour and want to check on timers, look up steps, or ask follow-up questions throughout.

The workaround is session rotation: detect when a session is about to expire, save context, and start a new session. Strands does not handle this automatically yet. For the demo, 8 minutes is enough to show the concept, but a production kitchen assistant would need this solved.

For comparison, OpenAI Realtime supports 60-minute sessions, though at higher cost and without the tight AWS integration.

Echo Cancellation

Nova Sonic does not handle echo cancellation server-side - it expects clean audio at the input. If the agent's output audio gets picked up by the microphone, you get a feedback loop where the agent starts responding to itself.

The browser handles this transparently. The Web Audio API provides built-in acoustic echo cancellation via getUserMedia({ echoCancellation: true }) - the same mechanism that makes WebRTC video calls work without headsets. The browser subtracts the known output signal from the microphone input, so you can use laptop speakers and mic directly. No headset required.

If you port to other platforms, the same principle applies: iOS (AVAudioSession) and Android audio frameworks include system-level AEC, and smart speakers use hardware beamforming. The key point is that echo cancellation is an I/O concern, not a model concern - solve it at the client layer.

Observability Gap: No Invocation Logging

Bedrock invocation logging does not cover the InvokeModelWithBidirectionalStream API that Nova Sonic uses. If you have invocation logging enabled and expect to see Nova Sonic calls in CloudWatch, you will not find them. The API calls appear in CloudTrail as events, but the actual conversation content (audio, transcripts, tool calls) is not captured.

For cost tracking, Nova Sonic is priced by audio duration (input and output seconds) rather than text tokens. AWS Cost Explorer filtered by Bedrock service is the most reliable way to monitor spend.

Timer Notifications Outlive Sessions

Because of the 8-minute session limit, a timer set for 20 minutes will expire after the Nova Sonic session has ended. The timer still fires (it is a local asyncio task), but it can only print to the console - it cannot speak through the agent. A production version would need a separate notification channel (push notification, separate audio alert, etc.).

Sample Conversation

Here is a typical session showing the back-and-forth between the user and the assistant, including tool calls. The transcripts below are captured from the browser UI - Nova Sonic generates both the audio and the text transcript simultaneously.

You: "What pasta recipes do you have?"

[search_recipes tool called]

Assistant: "I found two pasta recipes in the collection. The first is a classic spaghetti carbonara - it uses eggs, pecorino romano, guanciale, and black pepper. No cream, as the traditional version gets its
richness from the egg and cheese sauce. The second is a baked ziti with Italian sausage that serves about eight people. Would you like the full details on either one?"

You: "How many calories in a cup of pasta?"

[nutrition_lookup tool called]

Assistant: "One cup of cooked spaghetti has about 220 calories, with 43 grams of carbohydrates and 8 grams of protein. That is for plain pasta without sauce."

You: "OK let's make the carbonara. Set a timer for 12 minutes for the pasta."

[set_timer tool called]

Assistant: "Timer set for 12 minutes for the pasta. While the pasta cooks, let me walk you through the sauce. You will want to whisk together three egg yolks and one whole egg with about a cup of grated
pecorino-"

You: "Wait, how much pecorino was that?"

Assistant: "One cup of grated pecorino romano. You can mix in some parmesan too - about half and half works well."

You: "What is that in grams?"

[convert_units tool called]

Assistant: "One cup of grated pecorino is roughly 100 grams, and one cup of parmesan is about 80 grams. So for a half-and-half mix, that would be 50 grams pecorino and 40 grams parmesan."

Notice how the assistant handles interruptions naturally (cutting off mid-sentence when asked about pecorino), chains tool calls in context (nutrition lookup followed by unit conversion), and maintains conversational flow throughout.

Try It Yourself

The complete code is on GitHub: github.com/RDarrylR/serverless-family-recipes-bidirectional-nova-sonic

git clone https://github.com/RDarrylR/serverless-family-recipes-bidirectional-nova-sonic.git
cd serverless-family-recipes-bidirectional-nova-sonic
uv sync
make install-frontend

# Terminal 1: WebSocket server
make serve

# Terminal 2: Vite dev server
make serve-frontend

Open http://localhost:5173, click the microphone, and start talking. No headset needed - the browser handles echo cancellation.

You will need Bedrock model access enabled for Nova Sonic v2 in your AWS account. If you are using the recipe search tool, you will also need a Bedrock Knowledge Base with your recipes indexed.

The Audio Pacing Problem

This was the most surprising issue I hit. After wiring everything up, the assistant's voice worked - but after tool calls (like searching recipes or looking up nutrition), all the speech played back bunched together with no natural pauses between sentences. Short responses sounded fine. Long responses after tool calls sounded like someone hit 2x speed.

The root cause: Nova Sonic generates audio faster than real-time after receiving a tool result. When the model has the full tool output text available at once, it produces all the speech audio in a burst rather than at natural speech pace. With WebSocket forwarding, there is no backpressure - the server receives hundreds of audio chunks in seconds and immediately forwards them all to the browser.

The fix is a sufficiently large client-side ring buffer. The browser's AudioWorklet maintains a ring buffer sized for 60 seconds of audio at 24kHz. Audio arrives in bursts from the server, gets queued in the ring buffer, and plays back at the correct hardware rate. The AudioWorklet's process() callback naturally paces playback - it pulls exactly 128 samples per callback at the AudioContext sample rate, regardless of how fast data arrives.

class AudioPlayerProcessor extends AudioWorkletProcessor {
  constructor() {
    super();
    // 60 seconds at 24kHz - handles faster-than-realtime bursts
    this._bufferSize = 24000 * 60;
    this._buffer = new Float32Array(this._bufferSize);
    this._writePos = 0;
    this._readPos = 0;

    this.port.onmessage = (event) => {
      if (event.data.type === 'audio') {
        this._enqueue(event.data.samples);
      } else if (event.data.type === 'barge-in') {
        this._readPos = this._writePos; // Clear instantly
      }
    };
  }

  process(inputs, outputs) {
    const output = outputs[0][0];
    for (let i = 0; i < output.length; i++) {
      if (this._readPos < this._writePos) {
        output[i] = this._buffer[this._readPos % this._bufferSize];
        this._readPos++;
      } else {
        output[i] = 0; // Silence when buffer is empty
      }
    }
    return true;
  }
}

The initial version used a 5-second buffer, which overflowed on longer responses - the write pointer wrapped around and overwrote unplayed samples, causing garbled audio. 60 seconds handles even the longest recipe descriptions comfortably.

For interruptions (barge-in), the server sends a bidi_interruption event and the browser sets readPos = writePos, which instantly silences playback. New audio from the updated response then fills the buffer from that point.

Two AudioContexts

The browser uses two separate AudioContext instances: one at 16kHz for capture and one at 24kHz for playback. These cannot share a context because Web Audio requires a single sample rate per context, and Nova Sonic's input and output rates differ.

Capture uses ScriptProcessorNode (deprecated but universal) to grab Float32 audio frames, convert them to PCM16, base64-encode, and send over WebSocket. Playback uses an AudioWorkletNode with a ring buffer for gapless streaming - the audio thread pulls samples continuously, and barge-in clears the buffer instantly when the server signals an interruption.

Deploying to AgentCore

For local development, this runs as two processes: a FastAPI server and a Vite dev server with a WebSocket proxy. For production, I deployed the voice agent to AWS Bedrock AgentCore Runtime as a Docker container.

The deployed architecture:

Frontend - React SPA on S3 + CloudFront, same pattern as the text-based recipe assistant
Auth - Cognito User Pool (email/password) + Identity Pool. The browser exchanges a Cognito JWT for temporary AWS credentials, then uses those to SigV4-sign a WebSocket URL directly to AgentCore. No API Gateway or Lambda in the WebSocket path.
Agent Runtime - The FastAPI server runs as an ARM64 container in AgentCore. The same server.py that runs locally deploys without changes - AgentCore handles health checks (/ping), scaling, and WebSocket proxying.
Dual mode - The frontend auto-detects the deployment mode. If VITE_AGENT_RUNTIME_ARN is set, it uses SigV4-signed WebSocket to AgentCore. If not, it connects to the local Vite proxy. No code changes needed to switch between local and deployed.

The Dockerfile

The container packages the same FastAPI server that runs locally. ARM64 is required by AgentCore Runtime.

FROM --platform=linux/arm64 public.ecr.aws/docker/library/python:3.13-slim

# PyAudio is a transitive dependency of strands-agents[bidi-all]
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    libasound-dev libportaudio2 libportaudiocpp0 \
    portaudio19-dev gcc python3-dev && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY config.py server.py ./
COPY tools/ ./tools/

RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser

ENV CONTAINER_ENV=true
ENV PYTHONUNBUFFERED=1
EXPOSE 8080

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD ["python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/ping').read()"]

CMD ["opentelemetry-instrument", "uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8080"]

The CONTAINER_ENV flag tells server.py to bind to 0.0.0.0 instead of 127.0.0.1. The opentelemetry-instrument wrapper is required for AgentCore to capture container logs in CloudWatch - without it, stdout and stderr are silently dropped. The aws-opentelemetry-distro package in requirements.txt provides this wrapper. The health check hits the /ping endpoint that AgentCore polls every few seconds.

Terraform: AgentCore IAM Role

The container needs an IAM role that AgentCore assumes on its behalf. This is where the model ID gotcha matters most - get the ARN wrong and Nova Sonic silently ignores audio.

resource "aws_iam_role" "agentcore" {
  name = "${var.project_name}-${var.environment}-agentcore"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "bedrock-agentcore.amazonaws.com" }
      Action    = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy" "agentcore" {
  name = "agentcore-permissions"
  role = aws_iam_role.agentcore.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "NovaSonicAccess"
        Effect = "Allow"
        Action = [
          "bedrock:InvokeModel",
          "bedrock:InvokeModelWithResponseStream",
          "bedrock:InvokeModelWithBidirectionalStream",
        ]
        # NOTE: The model ID is amazon.nova-2-sonic-v1, NOT amazon.nova-sonic-v2
        Resource = [
          "arn:aws:bedrock:${var.aws_region}::foundation-model/amazon.nova-2-sonic-v1*"
        ]
      },
      {
        Sid      = "KnowledgeBaseAccess"
        Effect   = "Allow"
        Action   = ["bedrock:Retrieve"]
        Resource = [
          "arn:aws:bedrock:${var.aws_region}:${data.aws_caller_identity.current.account_id}:knowledge-base/${var.knowledge_base_id}"
        ]
      },
      {
        Sid    = "ECRImageAccess"
        Effect = "Allow"
        Action = ["ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer"]
        Resource = [
          "arn:aws:ecr:${var.aws_region}:${data.aws_caller_identity.current.account_id}:repository/*"
        ]
      },
      {
        Sid      = "ECRTokenAccess"
        Effect   = "Allow"
        Action   = ["ecr:GetAuthorizationToken"]
        Resource = "*"
      },
      # CloudWatch, X-Ray, and WorkloadIdentity statements omitted for brevity
    ]
  })
}

The bedrock-agentcore.amazonaws.com service principal in the trust policy allows AgentCore to assume this role when running the container. The policy grants exactly what the agent needs: invoke Nova Sonic, retrieve from the Knowledge Base, and pull the container image from ECR.

Terraform: Cognito Authentication

The browser needs AWS credentials to sign WebSocket requests. Cognito provides a two-step flow: User Pool for authentication, Identity Pool for credential exchange.

# User Pool - email/password authentication
resource "aws_cognito_user_pool" "main" {
  name                     = "${var.project_name}-${var.environment}-users"
  username_attributes      = ["email"]
  auto_verified_attributes = ["email"]

  admin_create_user_config {
    allow_admin_create_user_only = true  # No self-signup for the demo
  }

  password_policy {
    minimum_length    = 8
    require_lowercase = true
    require_uppercase = true
    require_numbers   = true
  }
}

# Public client for the SPA - no client secret
resource "aws_cognito_user_pool_client" "web" {
  name            = "${var.project_name}-${var.environment}-web-client"
  user_pool_id    = aws_cognito_user_pool.main.id
  generate_secret = false

  explicit_auth_flows = [
    "ALLOW_USER_SRP_AUTH",
    "ALLOW_REFRESH_TOKEN_AUTH",
  ]
}

# Identity Pool - exchange JWT for temporary AWS credentials
resource "aws_cognito_identity_pool" "main" {
  identity_pool_name               = "${var.project_name}-${var.environment}-identity"
  allow_unauthenticated_identities = false

  cognito_identity_providers {
    client_id               = aws_cognito_user_pool_client.web.id
    provider_name           = aws_cognito_user_pool.main.endpoint
    server_side_token_check = false
  }
}

# IAM role that authenticated users assume
resource "aws_iam_role" "cognito_authenticated" {
  name = "${var.project_name}-${var.environment}-cognito-auth-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Federated = "cognito-identity.amazonaws.com" }
      Action    = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringEquals = {
          "cognito-identity.amazonaws.com:aud" = aws_cognito_identity_pool.main.id
        }
        "ForAnyValue:StringLike" = {
          "cognito-identity.amazonaws.com:amr" = "authenticated"
        }
      }
    }]
  })
}

# Grant authenticated users permission to invoke the agent via WebSocket
resource "aws_iam_role_policy" "agentcore_invoke" {
  count = var.agent_runtime_arn != "" ? 1 : 0
  name  = "agentcore-invoke"
  role  = aws_iam_role.cognito_authenticated.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = [
        "bedrock-agentcore:InvokeAgentRuntime",
        "bedrock-agentcore:InvokeAgentRuntimeWithWebSocketStream",
        "bedrock-agentcore:InvokeRuntime",
        "bedrock-agentcore:InvokeRuntimeWithResponseStream",
      ]
      Resource = "*"
    }]
  })
}

The count = var.agent_runtime_arn != "" ? 1 : 0 handles a chicken-and-egg problem: Terraform creates the Cognito resources first, then you create the AgentCore runtime via CLI (there is no Terraform provider for AgentCore yet), then re-run Terraform with the runtime ARN to attach the invoke policy.

IAM resource scoping gotcha: You might expect to scope Resource to your specific runtime ARN instead of "*". I tried this and it breaks WebSocket connections with a silent failure - the browser gets a connection refused with no useful error. The InvokeAgentRuntimeWithWebSocketStream action evaluates against a resource ARN that includes session and qualifier components, not just the runtime ARN. The official AWS sample uses Resource: "*" for the same reason. Until AWS documents the exact resource ARN format for WebSocket invocations, "*" is the only option that works. The blast radius is limited since these are Cognito-scoped temporary credentials that can only call bedrock-agentcore:Invoke* actions.

SigV4 WebSocket Presigning

This is the most unusual piece of the deployment. The browser cannot use a simple WebSocket URL - it needs to SigV4-sign the request using temporary AWS credentials from Cognito. The signing process constructs the URL, signs it with the AWS Signature Version 4 algorithm, and appends the signature as query parameters.

import { getAWSCredentials } from './aws-credentials.js';
import { Sha256 } from '@aws-crypto/sha256-js';
import { SignatureV4 } from '@aws-sdk/signature-v4';
import { HttpRequest } from '@smithy/protocol-http';

export async function getPresignedWebSocketUrl(agentRuntimeArn, sessionId) {
  const region = import.meta.env.VITE_REGION || 'us-east-1';
  const credentials = await getAWSCredentials();

  // Build the AgentCore WebSocket URL
  const encodedArn = encodeURIComponent(agentRuntimeArn);
  const url = new URL(
    `https://bedrock-agentcore.${region}.amazonaws.com/runtimes/${encodedArn}/ws`
  );
  url.searchParams.set('qualifier', 'DEFAULT');
  url.searchParams.set('X-Amzn-Bedrock-AgentCore-Runtime-Session-Id', sessionId);

  // Sign with SigV4
  const signer = new SignatureV4({
    service: 'bedrock-agentcore',
    region,
    credentials: {
      accessKeyId: credentials.accessKeyId,
      secretAccessKey: credentials.secretAccessKey,
      sessionToken: credentials.sessionToken,
    },
    sha256: Sha256,
  });

  const request = new HttpRequest({
    method: 'GET',
    protocol: 'https:',
    hostname: url.hostname,
    path: url.pathname,
    query: Object.fromEntries(url.searchParams),
    headers: { host: url.hostname },
  });

  const signed = await signer.presign(request, { expiresIn: 3600 });

  // Convert to wss:// URL
  const queryString = Object.entries(signed.query || {})
    .map(([k, v]) => `${encodeURIComponent(k)}=${encodeURIComponent(String(v))}`)
    .join('&');

  return `wss://${signed.hostname}${signed.path}?${queryString}`;
}

The credential exchange happens one step earlier. The browser takes the Cognito ID token and exchanges it for temporary AWS credentials via the Identity Pool:

import { CognitoIdentityClient, GetIdCommand,
         GetCredentialsForIdentityCommand } from '@aws-sdk/client-cognito-identity';

export async function getAWSCredentials() {
  const idToken = await getIdToken();
  const client = new CognitoIdentityClient({ region });
  const providerName = `cognito-idp.${region}.amazonaws.com/${userPoolId}`;

  // Step 1: Get identity ID from the token
  const { IdentityId } = await client.send(
    new GetIdCommand({
      IdentityPoolId: identityPoolId,
      Logins: { [providerName]: idToken },
    })
  );

  // Step 2: Exchange for temporary AWS credentials
  const { Credentials } = await client.send(
    new GetCredentialsForIdentityCommand({
      IdentityId,
      Logins: { [providerName]: idToken },
    })
  );

  return {
    accessKeyId: Credentials.AccessKeyId,
    secretAccessKey: Credentials.SecretKey,
    sessionToken: Credentials.SessionToken,
  };
}

These temporary credentials are scoped by the Cognito authenticated role - they can only call bedrock-agentcore:Invoke* actions, not other AWS services.

Deployment Commands

The full deployment workflow:

# 1. Provision infrastructure (S3, CloudFront, Cognito, ECR)
make apply

# 2. Build and push the ARM64 container to ECR
make docker-build
make docker-push

# 3. Create the AgentCore runtime (first time only)
make create-agent

# 4. Add the runtime ARN to terraform.tfvars, re-apply for Cognito IAM policy
make apply

# 5. Generate frontend .env from Terraform outputs
make setup-env

# 6. Build and deploy frontend to S3 + invalidate CloudFront cache
make deploy-frontend

For subsequent code changes, make deploy-agent rebuilds the container and updates the runtime in one step.

Key deployment gotcha: AgentCore container deployments require unique ECR image tags to force replacement. Pushing a new image with the latest tag does not trigger a container update - you must use a unique tag (I use v$(date +%Y%m%d%H%M%S)) and pass it to update-agent-runtime.

What Does It Cost?

Nova Sonic pricing is per-token, not per-minute. Audio converts at roughly 25 tokens per second.

Token type	Price per 1,000 tokens
Speech input	$0.0034
Speech output	$0.0136
Text input	$0.00006
Text output	$0.00024

Speech tokens dominate the bill. Text tokens only apply to tool call inputs/outputs and any conversation history injected as text - these are negligible for a voice-only session.

Typical session costs:

A 5-minute cooking session where the user asks 4-5 questions and the assistant responds with recipe steps and tool results:

Speech input (~2 min of user speaking): 2 x 60 x 25 = 3,000 tokens = $0.010
Speech output (~3 min of assistant speaking): 3 x 60 x 25 = 4,500 tokens = $0.061
Text tokens (tool calls/results): ~2,000 tokens = $0.0005
Total: ~$0.07 per session

For comparison, an 8-minute session (the Nova Sonic maximum) with continuous conversation:

Speech input (~3.5 min): 5,250 tokens = $0.018
Speech output (~4.5 min): 6,750 tokens = $0.092
Total: ~$0.11 per session

That is roughly 80% cheaper than OpenAI's GPT-4o Realtime API for equivalent conversation length.

Tracking costs in code:

AWS does not currently publish CloudWatch metrics for InvokeModelWithBidirectionalStream - the standard Bedrock metrics only cover InvokeModel, Converse, and their streaming variants. Model invocation logging and Application Inference Profiles (for cost tagging) also do not support bidirectional streaming yet.

The practical workaround is application-level cost estimation. The server logs session duration, audio chunk counts, and an estimated cost for each session:

# Nova Sonic pricing (per 1,000 tokens)
SPEECH_INPUT_PRICE_PER_1K = 0.0034
SPEECH_OUTPUT_PRICE_PER_1K = 0.0136
TOKENS_PER_SECOND_AUDIO = 25

# In the WebSocket handler:
session_start = time.monotonic()
input_audio_chunks = 0
output_audio_chunks = 0

# ... wrap receive/send to count audio chunks ...

# In the finally block:
session_duration = time.monotonic() - session_start
input_tokens = session_duration * TOKENS_PER_SECOND_AUDIO
output_tokens = session_duration * TOKENS_PER_SECOND_AUDIO
estimated_cost = (input_tokens / 1000) * SPEECH_INPUT_PRICE_PER_1K \
               + (output_tokens / 1000) * SPEECH_OUTPUT_PRICE_PER_1K

logger.info(
    "Session ended: duration=%.1fs, est_cost=$%.4f",
    session_duration, estimated_cost,
)

This uses wall-clock time as an upper bound (both speakers are not active simultaneously for the full duration). For aggregate cost monitoring, AWS Cost Explorer does show Bedrock charges broken down by model, so you can track monthly Nova Sonic spend at the account level - you just cannot get per-invocation breakdowns without application-level logging.

Converging Voice and Text: Future Integration

I now have two separate recipe assistants - a text-based version that uses CloudFront + Lambda + AgentCore for HTTP/SSE streaming, and this voice version that uses CloudFront + Cognito + AgentCore for WebSocket streaming. They share the same Bedrock Knowledge Base and similar tool implementations, but have separate infrastructure and deployment pipelines.

The natural next step is to combine them into a single application with both input modes. The architecture would look like:

Unified frontend - A single React SPA that offers both a chat interface and a voice interface. The text chat already uses SSE streaming through Lambda; the voice mode connects directly to AgentCore via WebSocket. Both modes share the same auth (Cognito) and the same CloudFront distribution.
Shared agent - Both modes could use the same AgentCore runtime with the same tools. The text path would go through Lambda (for SSE streaming), while the voice path would connect directly via WebSocket. The agent code is already nearly identical - same tools, same Knowledge Base, same system prompt. The main difference is the I/O layer.
Shared infrastructure - The Terraform modules for auth, CDN, and storage are similar between the two projects. Combining them eliminates duplication and simplifies deployment.
Mode switching - In the kitchen, you might start with text ("what should I make for dinner?"), switch to voice once you start cooking ("what is the next step?"), and go back to text when things get noisy. A unified app would make this seamless.

The biggest challenge is the streaming protocol difference. The text assistant uses HTTP POST + SSE (request-response), while the voice assistant uses a persistent bidirectional WebSocket. Lambda handles the text path well but maybe isn't the best approach for the WebSocket connections needed here. AgentCore handles Websockets for this case well. The solution may be to have both paths - Lambda for text, AgentCore container for voice - behind the same CloudFront distribution.

What's Next

Session rotation - Automatically reconnect when the 8-minute Nova Sonic limit is reached, preserving conversation context
Unified app - Merge the text and voice assistants into a single application with both input modes, shared tools, and shared infrastructure
Alexa integration - Strands BidiAgent with a custom I/O adapter could bridge to the Alexa Skills Kit for hands-free kitchen use
Multi-modal input - Nova Sonic v2 supports text input alongside audio. Adding a text fallback for when voice is not practical (quiet environments, accessibility needs)

The gap between a text-based agent and a voice-based agent is smaller than I expected. Strands abstracts the hard parts - WebSocket management, audio encoding, interruption handling, concurrent tool execution - and lets you focus on the tools and the system prompt. The same @tool decorator, the same docstring-based tool selection, the same Bedrock integration. Just with a microphone instead of a text box.

If you have an existing Strands agent, adding voice is closer to a weekend project than a rewrite.

Connect with me on X, Bluesky, LinkedIn, Medium, Dev.to, GitHub, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

Forem: Darryl Ruggles

Live Canary Deployments with AWS SAM, the New WebSocket API Resource, and Lambda Durable Functions

Architecture

SAM in 30 seconds

The new WebSocket API resource

Authorization is $connect-only, and $connect auth alone isn't enough

Why not just SSE?

Sending messages back from Lambda

Lambda Durable Functions

IAM for durable execution

The Python SDK

The killer feature: callbacks

Heartbeating to detect superseded callbacks

Step Functions vs Durable Functions

What SAM CLI brings to the table

Policy templates over Connectors

Things to watch for

Production hardening checklist

Cost and cleanup

Wrapping up

Resources

My related posts

It's All About That Memory - Using Long and Short Term Memory with Agents

Architecture Overview

AgentCore Memory - The 4 Strategies

Short-Term Memory (STM)

Semantic Strategy - "CaseFiles"

Summary Strategy - "CaseNotes"

User Preferences Strategy - "DetectiveStyle"

Episodic Strategy - "Interrogations"

How the Strategies Map to the Investigation

A Note on Namespace Design

Where Does the Extraction Logic Live?

Building the Agent

System Prompt - Noir Narrator Persona

Custom Tools

Memory Integration with Strands

Model Configuration

Infrastructure as Code

Memory and KMS

Three Strategies via Terraform, One via CLI

A Note on IAM Permissions

Naming Constraints

The Frontend

Demo Walkthrough

Session 1: The Crime Scene

Session 2: The Case Continues

Observing Memory in Real Time

What I Learned

S3 Files: The End of Download-Process-Upload (with Terraform)

The Long Road to a Real S3 File System

The "Before" Pattern: Download-Process-Upload

The "After" Pattern: Just Use the File System

Building the Infrastructure with Terraform

The S3 Bucket (Versioning is Mandatory)

The S3 Files Service Role

Creating the File System, Mount Targets, and Access Point

The VPC (No NAT Gateway Needed)

The Lambda Function with S3 Files Mount

Performance Comparison

The Lambda Managed Instances Connection

The Three-Way EC2 Comparison: S3 API vs S3 Files vs Mountpoint

Large-Directory Walk (10,000 Small Files)

Large-File Throughput (5 x 1 GiB Random Binary)

When to Use Which

Things to Look Out For

60-Second Commit Delay

VPC Cold Starts

50 Million Object Limit

Key Name Restrictions

Delete and Update Propagation

Access Point Ownership

S3-to-NFS Propagation Delay

When NOT to Use S3 Files

Wrapping Up

Cost Summary

Key Takeaways

Resources

Amazon Aurora DSQL: A Practical Guide to AWS's Distributed SQL Database

Why Aurora DSQL?

Authorization is `$connect`-only, and `$connect` auth alone isn't enough