Forem: AWS Community Builders

S3 Files: The End of Download-Process-Upload (with Terraform)

Darryl Ruggles — Fri, 01 May 2026 15:28:59 +0000

On April 7, 2026, AWS launched S3 Files - a managed NFS v4.1/4.2 layer built on Amazon EFS that provides file-system semantics on top of S3, including read-after-write consistency, advisory file locking, and POSIX permissions. (AWS Storage Gateway's File Gateway has offered NFS-over-S3 for years, but as a caching gateway appliance, not a native file system with these guarantees.) You can mount S3 Files from EC2, Lambda, EKS, and ECS (Fargate and ECS Managed Instances launch types; EC2 launch type is not yet supported). Your code reads and writes files with open(), os.rename(), and os.listdir(). No boto3 for the data path. No /tmp juggling. No copy-then-delete to simulate a rename.

In this post, I'll build two identical document-processing Lambda functions - one using the traditional S3 API approach and one using S3 Files - deploy them with Terraform, and benchmark the difference.

The Long Road to a Real S3 File System

For nearly two decades, developers have been trying to use S3 as a file system. Here's how the tools evolved:

	s3fs-fuse (2010)	Mountpoint for S3 (2023)	S3 Files (2026)
Protocol	FUSE	FUSE	NFS 4.1/4.2 (managed)
Write support	Full (but slow)	Sequential/append only	Full read/write
Rename	Copy + delete (slow)	Not supported	Instant from the NFS client's perspective (async S3 sync)
File locking	No	No	Advisory locks
Consistency	Eventual	Eventual	Read-after-write
Dir listing (1000 files)	Slow	163ms	39ms
Small file reads (1000 files)	Very slow	87.1s	4.3s
Sequential write (100MB)	~100 MB/s	I/O errors	273 MB/s
AWS managed	No (community)	Client only	Yes
Max throughput	~100 MB/s	GB/s	TB/s aggregate

Performance figures from published launch-day benchmarks; see S3 Files vs Mountpoint vs s3fs-fuse comparison and DevelopersIO GA walkthrough.

Each generation solved the previous one's biggest limitation. s3fs-fuse gave you a file system but was slow and unreliable. Mountpoint gave you speed but restricted writes to append-only - ruling out most real applications. S3 Files closes the remaining gaps: file-system semantics including advisory file locking and POSIX permissions, managed infrastructure, and strong performance for both small and large file workloads.

The "Before" Pattern: Download-Process-Upload

If you've written a Lambda function that processes files in S3, you've written this pattern:

import boto3
import os
import json

s3 = boto3.client("s3")

def lambda_handler(event, context):
    # 1. List files in the inbox
    response = s3.list_objects_v2(Bucket=BUCKET, Prefix="inbox/")

    for obj in response.get("Contents", []):
        key = obj["Key"]
        filename = key.removeprefix("inbox/")

        # 2. Download to /tmp (the only writable space Lambda gives you)
        s3.download_file(BUCKET, key, f"/tmp/{filename}")

        # 3. Process the file
        with open(f"/tmp/{filename}", "r") as f:
            content = f.read()
        result = {"word_count": len(content.split()), "lines": content.count("\n")}

        # 4. Upload processed file (S3 has no rename - copy then delete)
        s3.copy_object(
            Bucket=BUCKET,
            CopySource={"Bucket": BUCKET, "Key": key},
            Key=f"processed/{filename}",
        )

        # 5. Upload metadata
        s3.put_object(
            Bucket=BUCKET,
            Key=f"processed/{filename}.meta.json",
            Body=json.dumps(result),
        )

        # 6. Delete the original
        s3.delete_object(Bucket=BUCKET, Key=key)

        # 7. Clean up /tmp (Lambda reuses containers)
        os.remove(f"/tmp/{filename}")

Every step is an S3 API call. Every file passes through /tmp. "Renaming" a file requires a full copy followed by a delete - two API calls for something that should be instant. If your function processes 100 files, that's hundreds of API calls, each adding latency.

And /tmp itself is limited. Lambda gives you 512MB by default (up to 10GB at extra cost). If you're processing large files or many files concurrently, you'll hit that ceiling.

The "After" Pattern: Just Use the File System

With S3 Files mounted at /mnt/docs, the same logic becomes:

import os
import json
from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.metrics import MetricUnit

logger = Logger()
tracer = Tracer()
metrics = Metrics()

MOUNT_PATH = os.environ["MOUNT_PATH"]

@logger.inject_lambda_context
@tracer.capture_lambda_handler
@metrics.log_metrics(capture_cold_start_metric=True)
def lambda_handler(event, context):
    inbox = os.path.join(MOUNT_PATH, "inbox")
    processed = os.path.join(MOUNT_PATH, "processed")
    os.makedirs(processed, exist_ok=True)

    count = 0
    for filename in os.listdir(inbox):
        src = os.path.join(inbox, filename)

        with open(src, "r") as f:
            content = f.read()
        result = {"word_count": len(content.split()), "lines": content.count("\n")}

        with open(os.path.join(processed, f"{filename}.meta.json"), "w") as f:
            json.dump(result, f)

        os.rename(src, os.path.join(processed, filename))
        count += 1

    metrics.add_metric(name="FilesProcessed", unit=MetricUnit.Count, value=count)
    logger.info("batch complete")

There is no boto3 import, no /tmp management, and no copy-then-delete dance. Powertools makes it easy to add structured logging, tracing, and EMF metrics - the decorators above wire all three into this handler. The rename returns instantly from the NFS client's perspective. The code is materially shorter and maps more directly to the workload.

One caveat: "instant" means instant from your code's perspective. Under the hood, S3 Files still has to copy + delete the S3 object to implement the rename - general-purpose S3 buckets have no native rename operation. (S3 Express One Zone directory buckets do have a RenameObject API, but S3 Files works with general-purpose buckets.) For single files, this happens fast enough to be invisible. For directory renames across thousands of objects, the S3-side sync can take a long time - AWS documentation warns about performance impact for large recursive rename operations on prefixes with many objects. Your NFS client sees the rename as complete immediately, but S3 API consumers see the old key until the background sync finishes.

This is not just cleaner code - it's a fundamentally different model. S3 remains the authoritative data store; the file system is a synchronized view. Your Lambda function sees files and directories. S3 sees objects and prefixes. Both are looking at the same data.

Building the Infrastructure with Terraform

Good news: the Terraform AWS provider shipped native S3 Files resources in v6.40.0 on April 8, 2026 - just one day after S3 Files went GA. The new resources are aws_s3files_file_system, aws_s3files_mount_target, and aws_s3files_access_point, plus corresponding data sources and aws_s3files_file_system_policy for resource-based policies.

The S3 Bucket (Versioning is Mandatory)

S3 Files requires bucket versioning to be enabled. This is how it tracks the relationship between file-system state and S3 object versions. The full bucket setup also includes SSE-S3 encryption (explicit, even though it's the default for new buckets), a public access block, and a bucket policy enforcing TLS-only access:

resource "aws_s3_bucket" "docs" {
  bucket = "${var.project_name}-${var.environment}-docs"
}

resource "aws_s3_bucket_versioning" "docs" {
  bucket = aws_s3_bucket.docs.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "docs" {
  bucket = aws_s3_bucket.docs.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "docs" {
  bucket                  = aws_s3_bucket.docs.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# Disable ACLs - bucket-owner-enforced is the default for new buckets, but
# being explicit prevents readers from relying on defaults they may not understand.
resource "aws_s3_bucket_ownership_controls" "docs" {
  bucket = aws_s3_bucket.docs.id
  rule {
    object_ownership = "BucketOwnerEnforced"
  }
}

# Versioning is mandatory for S3 Files, so without lifecycle cleanup old
# versions accumulate silently during repeated benchmark runs.
resource "aws_s3_bucket_lifecycle_configuration" "docs" {
  bucket = aws_s3_bucket.docs.id
  rule {
    id     = "expire-noncurrent-versions"
    status = "Enabled"
    noncurrent_version_expiration {
      noncurrent_days = 7
    }
  }
}

resource "aws_s3_bucket_policy" "docs" {
  bucket = aws_s3_bucket.docs.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Sid       = "DenyNonTLS"
      Effect    = "Deny"
      Principal = "*"
      Action    = "s3:*"
      Resource  = [aws_s3_bucket.docs.arn, "${aws_s3_bucket.docs.arn}/*"]
      Condition = { Bool = { "aws:SecureTransport" = "false" } }
    }]
  })
}

Production hardening notes: For workloads that should stay private inside the VPC, you can go beyond TLS-only and restrict bucket access to your S3 VPC endpoint using aws:sourceVpce or aws:sourceVpc conditions in the bucket policy. This can prevent bucket access except through your approved VPC or VPC endpoint, even when credentials are otherwise valid. For SSE-KMS encrypted buckets, the S3 Files service role would also need kms:GenerateDataKey, kms:Encrypt, kms:Decrypt, kms:ReEncryptFrom, and kms:ReEncryptTo scoped with kms:ViaService = s3.<region>.amazonaws.com. This demo uses SSE-S3 (AES256), so KMS permissions are not needed here.

The S3 Files Service Role

S3 Files needs an IAM role it can assume to read and write your bucket. This is separate from your Lambda's execution role. First surprise: the service principal is elasticfilesystem.amazonaws.com, not s3files.amazonaws.com. S3 Files is built on EFS, and the trust policy has to name the underlying service. If you guess the obvious name, CreateRole fails with MalformedPolicyDocument: Invalid principal.

resource "aws_iam_role" "s3files_service" {
  name = "${var.project_name}-${var.environment}-s3files-service"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Sid       = "AllowS3FilesAssumeRole"
      Effect    = "Allow"
      Principal = { Service = "elasticfilesystem.amazonaws.com" }
      Action    = "sts:AssumeRole"
      Condition = {
        StringEquals = {
          "aws:SourceAccount" = data.aws_caller_identity.current.account_id
        }
        ArnLike = {
          "aws:SourceArn" = "arn:aws:s3files:${data.aws_region.current.region}:${data.aws_caller_identity.current.account_id}:file-system/*"
        }
      }
    }]
  })
}

resource "aws_iam_role_policy" "s3files_bucket_access" {
  name = "s3-bucket-access"
  role = aws_iam_role.s3files_service.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = [
        "s3:ListBucket", "s3:ListBucketVersions",
        "s3:GetBucketLocation", "s3:GetBucketVersioning",
        "s3:AbortMultipartUpload", "s3:ListMultipartUploadParts",
        "s3:GetObject", "s3:GetObjectVersion", "s3:GetObjectTagging", "s3:GetObjectVersionTagging",
        "s3:PutObject", "s3:PutObjectTagging",
        "s3:DeleteObject", "s3:DeleteObjectVersion"
      ]
      Resource = [aws_s3_bucket.docs.arn, "${aws_s3_bucket.docs.arn}/*"]
      Condition = {
        StringEquals = {
          "aws:ResourceAccount" = data.aws_caller_identity.current.account_id
        }
      }
    }]
  })
}

The role also needs EventBridge permissions - this is the mechanism behind S3-to-NFS synchronization. S3 Files creates EventBridge rules (prefixed DO-NOT-DELETE-S3-Files*) to detect out-of-band bucket changes. Without these, S3-side writes never propagate to the NFS mount:

resource "aws_iam_role_policy" "s3files_eventbridge" {
  name = "eventbridge-sync"
  role = aws_iam_role.s3files_service.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "EventBridgeManage"
        Effect = "Allow"
        Action = [
          "events:PutRule", "events:PutTargets",
          "events:DeleteRule", "events:DisableRule",
          "events:EnableRule", "events:RemoveTargets"
        ]
        Resource = "arn:aws:events:*:*:rule/DO-NOT-DELETE-S3-Files*"
        Condition = {
          StringEquals = {
            "events:ManagedBy" = "elasticfilesystem.amazonaws.com"
          }
        }
      },
      {
        Sid    = "EventBridgeRead"
        Effect = "Allow"
        Action = [
          "events:DescribeRule", "events:ListRules",
          "events:ListRuleNamesByTarget", "events:ListTargetsByRule"
        ]
        Resource = "arn:aws:events:*:*:rule/*"
      }
    ]
  })
}

If you use SSE-KMS encryption on the bucket, you'd also need kms:GenerateDataKey, kms:Encrypt, kms:Decrypt, kms:ReEncryptFrom, and kms:ReEncryptTo scoped with kms:ViaService = s3.<region>.amazonaws.com. This demo uses SSE-S3 (AES256), so KMS permissions aren't needed.

The aws:SourceArn condition and the full set of object/multipart/EventBridge actions are documented in the S3 Files prerequisites. The biggest risk from an incomplete policy isn't a permission error - it's silent failure. Missing EventBridge permissions mean the sync rules never get created, and S3-side changes simply don't appear on the mount. Missing multipart permissions cause large-file uploads to leak incomplete parts.

Creating the File System, Mount Targets, and Access Point

The aws_s3files_file_system resource takes just a bucket ARN and the service role ARN:

resource "aws_s3files_file_system" "docs" {
  bucket   = aws_s3_bucket.docs.arn
  role_arn = aws_iam_role.s3files_service.arn
}

Mount targets go in each subnet where your Lambda runs. One per AZ:

resource "aws_s3files_mount_target" "az" {
  count = length(var.private_subnet_ids)

  file_system_id  = aws_s3files_file_system.docs.id
  subnet_id       = var.private_subnet_ids[count.index]
  security_groups = [aws_security_group.mount_target.id]
}

Mount targets take about 5 minutes to create. Terraform's create timeout handles that wait - but there's a trap: the provider returns once the API call completes, which happens before the target reaches the available lifecycle state. If you create a Lambda that references the access point immediately after, CreateFunction fails with not all are in the available life cycle state yet. The fix is an explicit wait between mount targets and downstream consumers:

resource "time_sleep" "wait_for_mount_targets" {
  depends_on      = [aws_s3files_mount_target.az]
  create_duration = "90s"
}

resource "aws_s3files_access_point" "lambda" {
  file_system_id = aws_s3files_file_system.docs.id
  depends_on     = [time_sleep.wait_for_mount_targets]

  # DEMO SHORTCUT: uid 0:0 avoids ownership collisions during the side-by-side
  # comparison. In production, prefer a scoped access point path with a non-root
  # UID/GID (e.g., uid=1000), or grant s3files:ClientRootAccess on the Lambda
  # role instead. AWS's Lambda console defaults to UID/GID 1000:1000 with
  # root_directory.path = "/lambda" for good reason.
  posix_user {
    uid = 0
    gid = 0
  }

  root_directory {
    path = "/"
  }
}

The access point controls the POSIX UID/GID that all NFS operations execute as. The choice of 0:0 here is a demo compromise, not a recommendation - I'll explain the tradeoffs and better alternatives in the "Things to Look Out For" section.

Finally, add an aws_s3files_file_system_policy - the resource-based policy on the file system itself (equivalent to a bucket policy). Without this, any principal with s3files:ClientMount in their IAM policy can mount your file system:

resource "aws_s3files_file_system_policy" "docs" {
  file_system_id = aws_s3files_file_system.docs.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid       = "AllowMountFromKnownRoles"
        Effect    = "Allow"
        Principal = { AWS = [var.lambda_role_arn, var.ec2_role_arn] }
        Action    = ["s3files:ClientMount", "s3files:ClientWrite"]
        Resource  = aws_s3files_file_system.docs.arn
      },
      {
        Sid       = "EnforceTLS"
        Effect    = "Deny"
        Principal = "*"
        Action    = "s3files:*"
        Resource  = aws_s3files_file_system.docs.arn
        Condition = { Bool = { "aws:SecureTransport" = "false" } }
      }
    ]
  })
}

The VPC (No NAT Gateway Needed)

S3 Files requires your Lambda to be in a VPC - the NFS mount targets live inside your VPC subnets. But you don't need a NAT Gateway (which costs about $35/month). Instead, use a free S3 Gateway VPC endpoint for S3 API traffic:

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_support   = true   # Required for VPC endpoints
  enable_dns_hostnames = true
}

resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]
}

# Free S3 Gateway endpoint - no NAT gateway needed
resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${data.aws_region.current.region}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = [aws_route_table.private.id]
}

Security groups allow NFS traffic (TCP 2049) between the Lambda and mount targets:

# Lambda can reach mount targets on NFS port
resource "aws_vpc_security_group_egress_rule" "lambda_to_nfs" {
  security_group_id            = aws_security_group.lambda_after.id
  referenced_security_group_id = aws_security_group.mount_target.id
  from_port                    = 2049
  to_port                      = 2049
  ip_protocol                  = "tcp"
}

# Mount targets accept NFS from Lambda
resource "aws_vpc_security_group_ingress_rule" "nfs_from_lambda" {
  security_group_id            = aws_security_group.mount_target.id
  referenced_security_group_id = aws_security_group.lambda_after.id
  from_port                    = 2049
  to_port                      = 2049
  ip_protocol                  = "tcp"
}

The Lambda Function with S3 Files Mount

The Lambda configuration uses the same file_system_config block as EFS. The key additions are the VPC config and the S3 Files-specific IAM permissions:

resource "aws_lambda_function" "processor_after" {
  function_name = "${var.project_name}-${var.environment}-after"
  runtime       = "python3.14"
  architectures = ["arm64"]
  memory_size   = 512    # >= 512MB enables direct S3 read optimization
  timeout       = 300
  handler       = "handler.lambda_handler"

  vpc_config {
    subnet_ids         = var.private_subnet_ids
    security_group_ids = [var.lambda_sg_id]
  }

  file_system_config {
    arn              = var.access_point_arn    # S3 Files access point
    local_mount_path = "/mnt/docs"            # Must start with /mnt/
  }

  environment {
    variables = {
      MOUNT_PATH = "/mnt/docs"
    }
  }
}

The Lambda execution role needs S3 Files mount permissions and S3 read permissions for the direct-read optimization:

resource "aws_iam_role_policy" "s3files_mount" {
  name = "s3files-mount"
  role = aws_iam_role.execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["s3files:ClientMount", "s3files:ClientWrite"]
      Resource = var.access_point_arn
    }]
  })
}

# Required for the >=1 MiB direct-read bypass (streams from S3 at up to 3 GB/s).
# Without this, reads silently fall back to the cached path - the mount works
# but you lose the throughput optimization and pay S3 Files access charges.
resource "aws_iam_role_policy" "s3_direct_read" {
  name = "s3-direct-read"
  role = aws_iam_role.execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["s3:GetObject", "s3:GetObjectVersion"]
      Resource = "${var.bucket_arn}/*"
    }]
  })
}

Note: s3files:ClientMount is required for all access. s3files:ClientWrite is only needed for read-write mounts. s3files:ClientRootAccess lets a non-root access point UID operate on root-owned entries (see the Access Point Ownership section below - it's the cleanest fix for mixed S3-API/NFS workflows). The s3:GetObject/s3:GetObjectVersion permissions are technically optional, but without them the direct-read bypass doesn't activate and your >=512MB memory setting buys you nothing.

Performance Comparison

I deployed both Lambda functions and ran them against 20 medium-sized text files (500-2000 words each) across 3 runs. The benchmark script seeds each approach into its own S3 prefix, invokes the relevant Lambda, and collects timing breakdowns from both handlers.

make benchmark

Actual results from a 20-file, 3-run benchmark (arm64 Lambda, 512MB, us-east-1):

Metric	Before (S3 API)	After (S3 Files)	Speedup
List files	80ms (min 74, max 92)	152ms (min 8, max 440)	0.5x
Read/Download (20 files)	991ms (min 920, max 1096)	139ms (min 124, max 146)	7.1x
Process	5ms	3ms	1.7x
Write metadata (20 files)	1788ms (min 1574, max 2054)	256ms (min 249, max 266)	7.0x
Move/rename (20 files)	530ms (min 465, max 647)	114ms (min 112, max 116)	4.6x
Lambda total	3394ms (min 3037, max 3878)	664ms (min 518, max 948)	5.1x
Wall clock (including invoke)	3505ms	1132ms	3.1x

The Lambda-internal win is about 5x. Wall clock narrows the gap because the after-Lambda pays a VPC cold start penalty on its first invocation in each run (2.2s wall on run 1, ~600ms on runs 2-3 once the ENI is warm). For batch workloads you'd amortize that across many files; for sporadic triggers you'd feel it every time.

The single non-win - list time - is counterintuitive but worth calling out. os.listdir over NFS had a cold-run outlier of 440ms (vs ~80ms for a warm ListObjectsV2 call). I didn't chase this down, but it looks like metadata that hasn't been touched recently isn't in the S3 Files cache yet and needs to be hydrated from S3 on first access. After warmup, listdir settles at 8ms - 10x faster than the S3 API.

The biggest wins are in small file reads (no per-object HTTP round trip), writes (no multipart setup for small files), and rename (a single inode operation vs CopyObject + DeleteObject).

The Lambda Managed Instances Connection

In my previous post on Lambda Managed Instances, I explored how high-memory Lambda functions unlock new workload patterns. S3 Files adds another dimension to this.

When your Lambda function has 512MB or more memory, S3 Files enables direct S3 read routing: reads of 1 MiB or larger bypass the file system's high-performance storage entirely and stream directly from S3 at up to 3 GB/s per client (that's a throughput ceiling, not a typical number - actual throughput depends on file size, network, and concurrency). These direct reads don't incur S3 Files access charges - you only pay standard S3 GET pricing. (Your Lambda execution role needs s3:GetObject and s3:GetObjectVersion on the bucket for this to work - without them, reads silently fall back to the cached path.)

There's a separate threshold at play too: files smaller than 128 KiB are asynchronously imported into the high-performance storage on first access (a prefetch optimization, not a bypass). Files between 128 KiB and 1 MiB get metadata imported but data is fetched on demand. This creates a three-tier read architecture:

Tiny files (under 128 KiB) (configs, metadata, indexes): prefetched into S3 Files cache, sub-millisecond on subsequent reads
Mid-size files (128 KiB to 1 MiB): fetched on demand from the cache or S3, depending on access pattern
Large files (1 MiB and above) (datasets, models, media): streamed directly from S3 at 3 GB/s, skipping the cache entirely

The 128 KiB import threshold is tunable per file system via aws_s3files_synchronization_configuration in Terraform (not shown in the demo). The 1 MiB direct-read bypass is not tunable.

For data-intensive Lambda workloads, combining Managed Instances (multi-concurrency, high memory) with S3 Files (mounted file system, direct S3 read bypass) is a compelling alternative to containerized processing.

The Three-Way EC2 Comparison: S3 API vs S3 Files vs Mountpoint

The Lambda benchmark above covers the serverless use case, but it doesn't include Mountpoint for Amazon S3 - AWS's FUSE-based file-system client. Mountpoint is widely used for analytics and ML workloads, so it's a natural comparison. There's just one problem: Mountpoint can't run on Lambda. It's FUSE-based, and Lambda's Firecracker microVM doesn't expose /dev/fuse or grant CAP_SYS_ADMIN - both required for userspace file-system mounts. S3 Files sidesteps this entirely by using NFS, which Lambda natively supports through its existing EFS mount infrastructure.

So for the three-way comparison, I added a Graviton EC2 instance (c7g.large, arm64, in the same VPC) with both S3 Files and Mountpoint mounted, plus direct S3 API access via boto3. Same bucket, same data, three different interfaces.

Large-Directory Walk (10,000 Small Files)

Seed 10,000 small text files under a single prefix, then enumerate every entry and stat each one:

Approach	Mean	Min	Max
S3 Files (NFS `os.listdir` + `os.stat`)	905ms	891ms	924ms
S3 API (`ListObjectsV2`, paginated)	1,666ms	1,637ms	1,698ms
Mountpoint (FUSE `os.listdir` + `os.stat`)	175,847ms	171,168ms	179,002ms

S3 Files is 1.8x faster than the S3 API. Mountpoint is 194x slower - nearly three minutes for 10,000 entries. This is Mountpoint's known weakness: it makes a ListObjectsV2 call per directory and then individual HeadObject calls for stat(), with no prefetching or metadata caching. If your workload involves browsing or enumerating directories, Mountpoint is the wrong tool.

Large-File Throughput (5 x 1 GiB Random Binary)

Seed five 1 GiB random binary files, stream-read each one into a SHA-256 hash, write the digest back:

Approach	Read time (5 GiB)	Read throughput	Write time
Mountpoint (FUSE)	11,158ms	459 MiB/s	1,469ms
S3 Files (NFS)	32,356ms	161 MiB/s	71ms
S3 API (`GetObject` stream)	129,228ms	43 MiB/s	151ms

For large sequential reads, Mountpoint dominates at 459 MiB/s - nearly 3x S3 Files and 10x the S3 API. This isn't an accident: Mountpoint splits each large read into parallel HTTP Range GET requests across multiple TCP connections, with aggressive read-ahead prefetching. A 1 GiB file read becomes many concurrent range fetches that saturate the network link. It's a purpose-built parallel download accelerator for large, sequential, read-heavy workloads (ML training data, analytics datasets, media processing).

S3 Files (161 MiB/s) goes through NFS 4.1/4.2 to a managed server that reads from its cache or S3 - the protocol framing and cache coherency tracking add overhead. The S3 API (43 MiB/s) is a single GetObject stream over one HTTP connection with no parallelism.

The same design that makes Mountpoint fast for large reads makes it very slow for directories: it has no metadata cache, so every stat() call becomes an individual HeadObject API call to S3. That's why 10,000 files takes 176 seconds.

Write time tells a similar story: Mountpoint takes 1,469ms to write five small digest files. S3 Files does it in 71ms. S3 API in 151ms. Mountpoint's FUSE-to-S3 translation adds high per-file overhead for small writes.

When to Use Which

The benchmark reveals that no single approach wins everywhere:

Use case	Best tool	Why
Interactive file operations (rename, create, list)	S3 Files	File-system semantics, metadata caching, rename instant from the NFS client's perspective (S3-side sync is async)
Large sequential reads (datasets, models, media)	Mountpoint	Highest throughput, zero software cost, no VPC needed
Serverless (Lambda)	S3 Files	Mountpoint can't run on Lambda at all
Simplest deployment (no VPC, no mounts)	S3 API	Slowest but zero infrastructure - works anywhere with IAM credentials
Directory-heavy workloads	S3 Files	In this benchmark, Mountpoint's per-entry overhead made large directory walks much slower

Things to Look Out For

S3 Files is impressive, but it's not magic. Here are the real-world constraints you need to know:

60-Second Commit Delay

S3 Files uses a "stage and commit" model. File-system writes are batched for approximately 60 seconds before committing to S3. Files you write are immediately visible through the NFS mount, but they won't appear in aws s3 ls or s3.list_objects_v2() for about a minute.

For the document processing use case, this is fine - the Lambda reads and writes through the mount, so consistency is maintained within the NFS view. But if you have a downstream process polling S3 directly for new objects, it will see a delay.

VPC Cold Starts

Putting Lambda in a VPC adds cold start latency. AWS has improved this significantly with Hyperplane ENI caching, but in this benchmark I observed roughly 1-2 seconds of additional cold start time compared to the non-VPC Lambda. For infrequently-invoked functions, this matters. For functions that process batches (like our document processor), the cold start is amortized across many files.

50 Million Object Limit

Each mounted file system supports up to 50 million objects. For most workloads this is generous, but if you're mounting a bucket with hundreds of millions of small objects, you'll need to scope the mount to a prefix. In Terraform, this is a creation-time argument on aws_s3files_file_system (not shown in the demo, which mounts the entire bucket). Via the CLI, use the --prefix flag on create-file-system.

Key Name Restrictions

S3 allows object keys that don't map cleanly to POSIX filenames. According to AWS documentation, keys with trailing slashes, path traversal patterns (../), or components longer than 255 characters will not appear in the file-system view. The objects remain accessible via the S3 API, but the file system won't show them. AWS recommends monitoring the CloudWatch ImportFailures metric to detect these cases, as there are no client-side errors.

Delete and Update Propagation

S3-side changes only propagate to the NFS mount for files whose data is currently in the high-performance storage (the "hot" cache). In testing, hot-file deletes via the S3 API remained readable on the mount for roughly 6-18 seconds before disappearing. Modifications followed the same pattern: the mount saw the stale version until the EventBridge notification arrived.

For files whose data has been expired out of the cache (cold files), S3-side changes don't propagate at all until the next NFS read, at which point S3 Files fetches the latest version from S3. So the 6-18 second range observed above is a hot-path number; cold-path updates are lazy and unbounded. If you're designing a pipeline that writes via the S3 API and reads via the mount, test both cases.

Access Point Ownership

This is the biggest surprise I hit, and it drove a design change in the demo.

Objects written through the NFS mount do carry POSIX ownership metadata - S3 Files stores it as user-defined S3 object metadata (file-permissions, file-owner, file-group, file-mtime) on every object it writes. But objects written via the S3 API - s3.put_object(), aws s3 cp, the before Lambda's boto3 calls - don't have that metadata. When S3 Files imports those API-written objects into the NFS view, they get default permissions: root:root (UID 0, GID 0) with mode 0644 for files and 0755 for directories. That asymmetry is the mechanism behind this issue: directories are traversable and readable by everyone (which is why the inbox reads worked), but only writable by root (which is why creating entries in processed/ failed). Those directories, incidentally, are just S3 prefixes materialized as zero-byte objects - which is why S3-API writes can create them as a side effect of PutObject and why they end up root-owned when imported.

The first time my after Lambda ran with posix_user { uid = 1000, gid = 1000 }, it failed with PermissionError: [Errno 13] Permission denied: '/mnt/docs/processed/.... The Lambda could read the inbox just fine, but it couldn't create anything under /mnt/docs/processed/ because S3 Files had reflected a previous before-Lambda PutObject into NFS as a root-owned directory.

Four ways out, ordered from best (least privilege) to most expedient:

Use a scoped access point path (recommended for production). Set root_directory.path = "/lambda-workspace" with creation_permissions { owner_uid = 1000, owner_gid = 1000, permissions = "755" } and posix_user { uid = 1000, gid = 1000 }. S3 Files creates that path owned by your UID, and the Lambda only sees its owned subtree. The tradeoff: every S3 object the Lambda needs to see must be keyed under lambda-workspace/..., and a raw aws s3 cp into any other prefix is invisible to the mount. This enforces least privilege at the access-point level.
Grant s3files:ClientRootAccess on the Lambda's IAM role. This lets a non-root UID (still posix_user { uid = 1000 }) perform operations against root-owned entries - including creating files inside root-owned directories imported from S3 - without running the entire Lambda as UID 0. It's the middle ground: keep least-privilege POSIX identity, elevate only for cross-boundary operations with S3-origin content. This permission is included in the AmazonS3FilesClientFullAccess managed policy, which is probably why I missed it - the demo's inline policy has only ClientMount + ClientWrite.
Avoid path collisions: have each S3-API-side producer write to a prefix the NFS client never writes into. The demo does this - the before Lambda writes to processed-before/ and the after Lambda writes to processed-after/ - so their outputs never fight over directory ownership.
Run as root (access point posix_user { uid = 0, gid = 0 }). The Lambda runs as "root" for NFS purposes and can write alongside S3-born files. This is what the demo uses because the side-by-side comparison needs both approaches to see the same bucket root. This is the opposite of least privilege - any NFS client can read, write, and delete anything on the mount. Last resort only.

If you're using S3 Files to replace an existing boto3 pipeline, plan this up front. Any prefix your NFS clients will write into should be created from the mount side first, or left entirely unpopulated from the S3 side. Anything written via PutObject will arrive in NFS as root-owned and block writes from non-root access points (unless you've granted ClientRootAccess).

Note: the demo pairs option 4 with an aws_s3files_file_system_policy that restricts which IAM principals can mount at all (deny-by-default, allow only the Lambda and EC2 benchmark roles, enforce TLS). If you use uid=0, this resource-based policy is your primary access control.

Related: don't pre-create "directory" marker objects (zero-byte inbox/, processed/, etc.) from Terraform. I had three aws_s3_object resources doing this and they turned out to be the exact cause of the ownership collision. The Lambda's os.makedirs(exist_ok=True) creates the directories over NFS with the correct access-point ownership - let it do its job.

S3-to-NFS Propagation Delay

Also worth knowing: writes go in both directions, but they don't propagate symmetrically. NFS writes commit to S3 on the 60-second schedule described above. S3 writes appear in the NFS view asynchronously via EventBridge notifications, which typically takes a few seconds but can take longer under load. If your benchmark seeds files via s3.put_object() and immediately invokes an NFS-mounted Lambda, the mount will see an empty inbox. The benchmark script in this project waits 60 seconds after S3-seeding to sidestep this.

Conflict resolution: if the same file is modified through both the NFS mount and the S3 API before synchronization completes, the S3 bucket version wins. The file-system copy is not silently overwritten - it gets moved to a .s3files-lost+found-<file-system-id> directory on the mount. Files in lost+found are not copied back to the S3 bucket and persist indefinitely on the file system, counting toward storage costs until explicitly deleted. This is important to understand for mixed API + file-system workflows: the S3 side is always authoritative, and your NFS edits may end up in lost+found if there's a race.

When NOT to Use S3 Files

S3 Files isn't always the right choice:

Read-only analytics at lowest cost: Mountpoint for S3 adds zero software cost and is optimized for sequential reads of large files. If you're running Spark, Presto, or ML training jobs that only read data, Mountpoint is cheaper and simpler (no VPC required).
Non-AWS or S3-compatible storage: s3fs-fuse works with MinIO, Ceph, and other S3-compatible object stores. S3 Files is AWS-only.
Existing EFS mounts: Lambda supports one file-system mount - EFS or S3 Files, not both. For any new build where the backing data lives in S3, prefer S3 Files over EFS (you skip the EFS-to-S3 sync problem entirely). Only stick with EFS if the function needs a shared writable file system that multiple Lambda invocations coordinate through simultaneously.
Latency-critical writes that must appear in S3 immediately: The 60-second commit delay means writes aren't visible to S3 API consumers right away. If you need sub-second S3 visibility, stick with direct S3 API calls.

Wrapping Up

S3 Files eliminates an entire category of boilerplate from AWS applications. The download-process-upload pattern that we've all written hundreds of times is no longer necessary. Your code just reads and writes files. The underlying storage happens to be S3.

The Terraform story is solid from day one - native provider resources shipped in v6.40.0, just one day after S3 Files went GA. Three resources (aws_s3files_file_system, aws_s3files_mount_target, aws_s3files_access_point) cover the full setup, and the file_system_config block on aws_lambda_function works identically to the existing EFS mount pattern.

All the code for this post - Terraform modules, Lambda handlers (with Powertools), the EC2 runner, benchmark scripts, and the Makefile - is available in the companion repository.

Cost Summary

For a demo deployment: approximately $75/month if you leave everything running. The EC2 instance (about $53/month) and SSM VPC interface endpoints (about $21/month, needed because the EC2 is in a private subnet with no NAT) are the bulk. Lambda and S3 costs are negligible. Stop the EC2 and run make destroy when done.

Cost note for production: S3 Files meters data reads and writes with a minimum of 32 KiB per operation, regardless of actual size. This benchmark's medium text files (500-2000 words) are above that threshold, so it didn't show up. But at scale with many tiny files - say 10,000 sub-1 KiB JSON configs read once each - you'd pay for 10,000 x 32 KiB = 320 MiB of reads, not 10 MiB. For small-file-heavy workloads, factor this into your cost model.

Key Takeaways

S3 Files provides file-system semantics on S3 via NFS v4.1/4.2 - read, write, rename, advisory file locking
Lambda functions with >=512MB memory get direct S3 read bypass for reads >=1 MiB (up to 3 GB/s ceiling) - but only if the execution role has s3:GetObject
No NAT Gateway needed - use S3 Gateway VPC endpoints (but add SSM interface endpoints for EC2)
Mountpoint can't run on Lambda (no FUSE) - S3 Files is the only file-system option for serverless
For large sequential reads, Mountpoint still wins (459 MiB/s vs 161 MiB/s) - it was purpose-built for throughput
For directory operations, Mountpoint is prohibitively slow (176s vs 0.9s for 10K entries) - use S3 Files
The 60-second commit delay (NFS to S3) and the async EventBridge propagation (S3 to NFS) are the two consistency boundaries you have to design around
Access point ownership interacts with S3-origin objects in ways that will surprise you - plan prefix ownership up front
The trust policy service principal is elasticfilesystem.amazonaws.com, not s3files.amazonaws.com - S3 Files is built on EFS
Native Terraform support shipped day one in AWS provider v6.40.0, but use a time_sleep between mount targets and Lambda to avoid lifecycle state races

Resources

Launching S3 Files, Making S3 Buckets Accessible as File Systems - AWS News Blog announcement
Amazon S3 Files Documentation - Official user guide
Configuring S3 Files Access for Lambda - Lambda-specific setup guide
S3 Files Getting Started Tutorial - Step-by-step walkthrough
Terraform aws_s3files_file_system Resource - Terraform provider docs
Terraform aws_s3files_mount_target Resource - Mount target configuration
Terraform aws_s3files_access_point Resource - Access point with POSIX user mapping
AWS S3 Files Stress Test - The Register's independent stress test with edge case findings
S3 Files vs Mountpoint vs s3fs-fuse Comparison - Detailed feature and performance comparison
Architecture Layers That S3 Files Eliminates - Architectural patterns analysis
Mountpoint for Amazon S3 - The read-heavy alternative for analytics workloads
Lambda Managed Instances with Terraform - High-memory Lambda patterns that complement S3 Files
Powertools for AWS Lambda - Best Practices By Default - Observability patterns for Lambda functions

Connect with me on X, Bluesky, LinkedIn, GitHub, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

LLM on EKS: Serving with vLLM

Daniel Pepuho — Fri, 01 May 2026 14:49:45 +0000

Last year, I mentioned that I'm interested in learning how to serve LLMs in production. At first it was just curiosity, but over time I wanted to actually try building something—not just reading about it.

This post is a small step in that direction: serving an LLM using vLLM, deployed on Amazon EKS, provisioned the infra using AWS CDK, and wrapped into a simple chatbot using Streamlit.

TL;DR

Exploring LLM serving on a Kubernetes cluster (EKS)
Using vLLM as the inference engine
Provisioning the infrastructure with AWS CDK (IaC)
Building a simple chatbot to interact with the LLM using Streamlit

What We Tryna Build

The idea is simple: build a small chatbot powered by an LLM and run the model on Kubernetes.

I'm not focusing on training models here. I just want to understand how to serve an LLM properly.

The flow looks like this:

User interacts with a chatbot (running locally)
The chatbot sends a request to a vLLM API
The model processes the request and returns a response
The vLLM service runs on Amazon EKS

Prerequisites

Before we dive in, you'll need:

AWS Account & IAM: An AWS Account ID and an IAM User with Administrator access (IAM to manage EKS). We'll need the IAM username to map kubectl (admin) permissions to the EKS cluster.
AWS CLI installed and configured (aws configure) using your IAM user credentials.
AWS CDK installed (npm install -g aws-cdk).

AWS usually limits new accounts to 0 vCPUs for "Running On-Demand G and VT instances". You'll need to go to the AWS Service Quotas console and request an increase to at least 4 vCPUs to run the g4dn.xlarge node.

The Stack

vLLM — inference engine for the LLM. Fast, supports streaming, and exposes an OpenAI-compatible API out of the box.
Amazon EKS — The Kubernetes service on AWS to run the vLLM workload.
AWS CDK — infrastructure as code to manage AWS infra, at this time I'll using Python. One cdk deploy and everything is provisioned.
Streamlit — simple chatbot UI that talks to the vLLM endpoint.

Why vLLM?

There are a few ways to serve an LLM — you could use TGI, Triton, or just raw HuggingFace transformers. I went with vLLM for a few reasons:

PagedAttention — manages GPU memory more efficiently, which matters a lot on a single g4dn.xlarge
OpenAI-compatible API — the chatbot can use the openai Python SDK without any changes
Streaming support — responses stream token by token, which makes the chatbot feel more responsive

Why EKS?

I could've just spun up an EC2 instance and SSH'd in. But that's not really building reliable infrastructure — that's just running a script on a server.

EKS gives us a proper environment to run GPU workloads: node groups, taints and tolerations to make sure only the vLLM pod lands on the GPU node, and a LoadBalancer service to expose the endpoint.

Environment Setup

Before getting into the code, let's set up a .env file at the root of the project. We'll use this to manage our AWS configurations so we don't hardcode them into the repository.

# AWS Config
AWS_DEFAULT_ACCOUNT=123456789012
AWS_DEFAULT_REGION=us-east-1
AWS_ADMIN_USER=your_aws_username
AWS_BUCKET=eks-llm-model-bucket

# EKS Config
CLUSTER_NAME=eks-llm

# VLLM Config
# VLLM_URL will be added later after the deployment is live
# VLLM_URL=http://<nlb-endpoint>.elb.us-east-1.amazonaws.com

The Code

EKS Stack

The EksStack provisions everything at the infrastructure level: VPC, EKS cluster, node groups, and an S3 bucket for model storage.

vpc = ec2.Vpc(self, "EksVpc", max_azs=2)

cluster = eks.Cluster(
    self, "EksCluster",
    version=eks.KubernetesVersion.V1_34,
    vpc=vpc,
    default_capacity=0,
    kubectl_layer=kubectl_layer,
)

default_capacity=0 means no default node group — we define our own below.

We have two node groups:

# 1. CPU, runs system pods (CoreDNS, kube-proxy, etc.)
self.cluster.add_nodegroup_capacity(
    "ManagedNodeGroup",
    desired_size=1,
    min_size=1,
    max_size=1,
    instance_types=[ec2.InstanceType("t3.medium")],
    ami_type=eks.NodegroupAmiType.AL2023_X86_64_STANDARD,
)

# 2. GPU, for running vLLM
gpu_node_role = iam.Role(
    self,
    "GpuNodeRole",
    assumed_by=iam.ServicePrincipal("ec2.amazonaws.com"),
    managed_policies=[
        iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEKSWorkerNodePolicy"),
        iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEC2ContainerRegistryReadOnly"),
        iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEKS_CNI_Policy"),
    ],
)

self.cluster.add_nodegroup_capacity(
    "GpuNodeGroup",
    desired_size=1,
    min_size=0,
    max_size=2,
    disk_size=100,
    instance_types=[ec2.InstanceType("g4dn.xlarge")],
    node_role=gpu_node_role,
    ami_type=eks.NodegroupAmiType.AL2023_X86_64_NVIDIA,
    labels={"workload": "gpu"},
    taints=[
        eks.TaintSpec(
            key="nvidia.com/gpu",
            value="true",
            effect=eks.TaintEffect.NO_SCHEDULE,
        )
    ],
)

self.cluster.aws_auth.add_user_mapping(
    iam.User.from_user_name(self, "AdminUser", os.environ["AWS_ADMIN_USER"]),
    groups=["system:masters"],
)

# Allow GPU nodes to read from the model bucket
self.model_bucket.grant_read(gpu_node_role)

The disk_size=100 ensures we don't get pod eviction issues, as the default 20GB is too small for the vLLM container image and the model cache. The taint nvidia.com/gpu=true:NoSchedule on the GPU node group means no pod will be scheduled there unless it explicitly tolerates it. This keeps system pods off the GPU node.

The S3 bucket is for model weights, and the GPU node role gets read access to it:

# S3 bucket for model weights
self.model_bucket = s3.Bucket(
    self,
    "ModelBucket",
    bucket_name=os.environ.get("AWS_BUCKET"),
    removal_policy=RemovalPolicy.RETAIN,
    block_public_access=s3.BlockPublicAccess.BLOCK_ALL,
)

We will create those instances:

Node	vCPU	Memory
t3.medium	2	4Gi
g4dn.xlarge	4	16Gi

vLLM Stack

The VllmStack takes the cluster from EksStack and deploys vLLM on top of it.

First, we install the NVIDIA device plugin via Helm. This is what makes EKS aware of the GPU on the node — without it, you can't request nvidia.com/gpu as a resource in your pod spec.

model_id = "hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4" # Our LLM

cluster.add_helm_chart(
    "NvidiaDevicePlugin",
    chart="nvidia-device-plugin",
    repository="https://nvidia.github.io/k8s-device-plugin",
    namespace="kube-system",
    values={
        "nodeSelector": {"workload": "gpu"},
        "tolerations": [{"key": "nvidia.com/gpu", "operator": "Exists", "effect": "NoSchedule"}],
    },
)

Note the toleration on the plugin itself — it needs to run on the GPU node to expose the GPU, so it has to tolerate the taint we set earlier.

Then the vLLM Deployment:

        cluster.add_manifest("VllmDeployment", {
            "apiVersion": "apps/v1",
            "kind": "Deployment",
            "metadata": {"name": "vllm", "namespace": "default"},
            "spec": {
                "replicas": 1,
                "selector": {"matchLabels": {"app": "vllm"}},
                "template": {
                    "metadata": {"labels": {"app": "vllm"}},
                    "spec": {
                        "tolerations": [{"key": "nvidia.com/gpu", "operator": "Exists", "effect": "NoSchedule"}],
                        "nodeSelector": {"workload": "gpu"},
                        "containers": [{
                            "name": "vllm",
                            "image": "vllm/vllm-openai:latest",
                            "args": [
                                "--model", model_id,
                                "--download-dir", "/model-cache",
                                "--dtype", "half",
                                "--quantization", "awq",
                                "--max-model-len", "4096",
                            ],
                            "env": [
                                {"name": "AWS_DEFAULT_REGION", "value": self.region},
                                {"name": "MODEL_BUCKET", "value": model_bucket_name},
                                {"name": "VLLM_PORT", "value": "8000"},
                            ],
                            "ports": [{"containerPort": 8000}],
                            "resources": {
                                "limits": {"nvidia.com/gpu": "1"},
                                "requests": {"memory": "12Gi", "cpu": "2"},
                            },
                            "volumeMounts": [{"name": "model-cache", "mountPath": "/model-cache"}],
                            "readinessProbe": {
                                "httpGet": {"path": "/health", "port": 8000},
                                "initialDelaySeconds": 120,
                                "periodSeconds": 15,
                            },
                        }],
                        "volumes": [{"name": "model-cache", "emptyDir": {}}],
                    },
                },
            },
        })

A few things worth noting:

nodeSelector: workload=gpu pins the pod to the GPU node group
nvidia.com/gpu: 1 requests exactly one GPU
dtype: half and quantization: awq drops the model size to ~5.7GB so it comfortably fits in the 16GB VRAM of g4dn.xlarge without OOM
max-model-len: 4096 caps the context window to avoid OOM

Finally, a LoadBalancer service to expose the endpoint publicly:

        cluster.add_manifest("VllmService", {
            "apiVersion": "v1",
            "kind": "Service",
            "metadata": {
                "name": "vllm",
                "namespace": "default",
                "annotations": {"service.beta.kubernetes.io/aws-load-balancer-type": "nlb"},
            },
            "spec": {
                "type": "LoadBalancer",
                "selector": {"app": "vllm"},
                "ports": [{"port": 80, "targetPort": 8000, "protocol": "TCP"}],
            },
        })

        # Internal cluster URL for the vLLM service
        self.vllm_url = os.environ.get("VLLM_URL", "http://vllm.default.svc.cluster.local:80")

        CfnOutput(self, "VllmUrl",
            value=self.vllm_url,
            description="Internal vLLM service URL",
        )

Deploy

cdk bootstrap   # first time only
cdk deploy --all

After the deployment is success you'll see the node on this cluster:

kubectl get nodes
NAME                          STATUS   ROLES    AGE     VERSION
ip-10-0-xx-yy.ec2.internal   Ready    <none>   9m18s   v1.34.7-eks-40737a8

$ kubectl get nodes --show-labels | grep gpu

ip-10-0-xx-yy.ec2.internal   Ready    <none>   4m23s   v1.34.7-eks-40737a8   beta.kubernetes.io/arch=amd64,
...
workload=gpu

Wait for the vLLM pod to be ready (~5-10 minutes, model is downloaded from HuggingFace on first start):

kubectl get pods -w

NAME                   READY   STATUS    RESTARTS   AGE
vllm-64c858884-pz4gz   0/1     Running   0          2m24s

kubectl logs -f deployment/vllm

WARNING 05-01 12:48:00 [argparse_utils.py:257] With `vllm serve`, you should provide the model as a positional argument or in a config file instead of via the `--model` o
ption. The `--model` option will be removed in a future version.
(APIServer pid=1) INFO 05-01 12:48:00 [utils.py:299]
(APIServer pid=1) INFO 05-01 12:48:00 [utils.py:299]        █     █     █▄   ▄█
(APIServer pid=1) INFO 05-01 12:48:00 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.20.0
(APIServer pid=1) INFO 05-01 12:48:00 [utils.py:299]   █▄█▀ █     █     █     █  model   hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4
(APIServer pid=1) INFO 05-01 12:48:00 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=1) INFO 05-01 12:48:00 [utils.py:299]
(APIServer pid=1) INFO 05-01 12:48:00 [utils.py:233] non-default args: {'model_tag': 'hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4', 'model': 'hugging-quants/Meta-L
lama-3.1-8B-Instruct-AWQ-INT4', 'dtype': 'half', 'max_model_len': 4096, 'quantization': 'awq', 'download_dir': '/model-cache'}
...

Inference

Once the pod is running, grab the NLB endpoint:

kubectl get svc vllm -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'

Then, check that the model is loaded:

curl http://<nlb-endpoint>/v1/models

You should see something like:

{
  "object": "list",
  "data": [{
    "id": "hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
    "object": "model",
    "owned_by": "vllm"
  }]
}

Next, send it a prompt:

curl http://<nlb-endpoint>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
    "messages": [
      {"role": "system", "content": "You are a helpful AI assistant."},
      {"role": "user", "content": "What is CAP theorem?"}
    ],
    "max_tokens": 150
  }'

Response:

{
  "id": "chatcmpl-8609921b347e2718",
  "object": "chat.completion",
  "created": 1777640350,
  "model": "hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The CAP theorem, also known as the Brewer's CAP theorem, is a fundamental concept in distributed systems. It was first proposed by Eric Brewer in 2000.\n\n**CAP stands for:**\n\n1. **Consistency**: This refers to the ability of a system to ensure that all nodes in the system have the same view of the data. In other words, all nodes see the same data, and any updates are reflected uniformly across the system.\n2. **Availability**: This refers to the ability of a system to ensure that every request receives a (non-error) response, without guarantee that it contains the most recent version of the information. In other words, the system is always available, even if some nodes are down or"
      },
}
...

The vLLM logs:

(APIServer pid=1) INFO:     10.0.253.68:42022 - "GET /health HTTP/1.1" 200 OK
(APIServer pid=1) INFO 05-01 12:59:19 [loggers.py:271] Engine 000: Avg prompt throughput: 1.4 tokens/s, Avg generation throughput: 4.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 28.8%
(APIServer pid=1) INFO:     10.0.253.68:33400 - "GET /health HTTP/1.1" 200 OK
(APIServer pid=1) INFO 05-01 12:59:29 [loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 4.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.3%, Prefix cache hit rate: 28.8%
(APIServer pid=1) INFO 05-01 12:59:39 [loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 4.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.4%, Prefix cache hit rate: 28.8%
(APIServer pid=1) INFO:     10.0.253.68:34814 - "GET /health HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.0.228.56:20496 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Finally, if you get a response back, congrats the model is live. 🎉

Working with API endpoint is great, but typing curl commands is not exactly a great user experience. Let's build a chatbot UI on top of this.

Chatbot with Streamlit

So let's built a simple chatbot using Streamlit that talks directly to the vLLM.

The nice part? Since vLLM exposes an OpenAI-compatible API, we can just use the openai Python SDK without any efforts.

Setup

Install the dependencies:

pip install streamlit openai

Let's create a simple UI:

mkdir src
touch src/app.py

import os
import streamlit as st
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Using the URL from AWS load balancer
VLLM_URL = os.getenv("VLLM_URL", "http://xx-yy.elb.us-east-1.amazonaws.com")
MODEL_ID = "hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4"

client = OpenAI(base_url=f"{VLLM_URL}/v1", api_key="none")

st.set_page_config(page_title="Llama 3 Chatbot", page_icon="🦙")
st.title("🦙 Llama 3 Chatbot")
st.caption("Powered by vLLM on EKS")

if "messages" not in st.session_state:
    st.session_state.messages = []

for msg in st.session_state.messages:
    st.chat_message(msg["role"]).write(msg["content"])

if prompt := st.chat_input("How is you day? Say something..."):
    st.session_state.messages.append({"role": "user", "content": prompt})
    st.chat_message("user").write(prompt)

    with st.chat_message("assistant"):
        stream = client.chat.completions.create(
            model=MODEL_ID,
            messages=st.session_state.messages,
            stream=True,
        )
        response = st.write_stream(chunk.choices[0].delta.content or "" for chunk in stream)

    st.session_state.messages.append({"role": "assistant", "content": response})

Run the UI:

streamlit run app.py

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://ww.xx.yy.zz:8501

Open the URL in your browser, and you should see a simple chatbot interface. Type in a message, and watch the response stream back token by token.

Conclusion

Serving an LLM is a bit different than deploying a typical web app. Memory constraints are real—we had to use an AWQ quantized model just to make it fit inside a single g4dn.xlarge instance without hitting OOM. But combining vLLM for inference and AWS CDK to spin up the EKS infrastructure makes the whole setup pretty straightforward.

Don't forget to run cdk destroy --all when you're done! Leaving an EKS cluster and a g4dn.xlarge node running 24/7 will result in a very hefty AWS bill.

Aight. Thanks for reading this post, hope you found something useful 🚀

References:

Enterprise AWS CDK: Architecting a Secure and Scalable Serverless API

Dickson — Fri, 01 May 2026 07:14:37 +0000

If you have spent any time deploying resources in AWS, you know that clicking through the AWS Management Console is fine for experimenting, but terrible for repeatable, production-grade systems. Historically, the answer to this was AWS CloudFormation — writing extensive JSON or YAML templates to declare your infrastructure. While CloudFormation is robust, writing thousands of lines of YAML isn't exactly a developer's dream.

The AWS Cloud Development Kit (CDK) is an open-source software development framework that lets you define your cloud application infrastructure using familiar programming languages like TypeScript, Python, Java, or C#. It acts as a powerful abstraction layer over CloudFormation. Instead of writing declarative YAML, you write imperative code to generate those templates. This means you get to use loops, conditionals, object-oriented principles, and your IDE's auto-completion to build your cloud architecture.

However, deploying a simple API Gateway connected to a Lambda and a Database is easy in a CDK tutorial, but difficult to scale across an enterprise. In large organizations, we face strict compliance requirements, security reviews, and the need for high developer velocity. A single monolithic CDK stack simply won't survive contact with multiple engineering teams.

In this article, we will walk through setting up a CDK project and explore the architectural decisions necessary to make a Serverless API (backed by Amazon Aurora PostgreSQL) secure, maintainable, and enterprise-ready.

Getting Started: Setting Up the CDK Project

Before diving into enterprise patterns, let's look at how to initialize a fresh CDK project. You will need Node.js and the AWS CLI installed and configured.

First, install the CDK toolkit globally:

npm install -g aws-cdk

Next, create a new directory for your project and initialize a TypeScript CDK app:

mkdir backend
cd backend
cdk init app --language typescript

Finally, if this is your first time using CDK in your AWS account/region, you need to "bootstrap" it. This provisions the necessary S3 buckets and IAM roles CDK needs to deploy your apps:

cdk bootstrap aws://ACCOUNT-NUMBER/REGION

After initialization, your core project structure will look like this:

backend/
├── bin/
│   └── backend.ts          # The entry point of your CDK application
├── lib/
│   └── backend-stack.ts    # Where your infrastructure stack is defined
├── cdk.json                # Configuration file telling CDK how to execute your app
├── package.json
└── tsconfig.json

While this structure is great for a starter project, we need to evolve it to support an enterprise architecture.

The Architecture Overview

We are building a typical modern backend:

Amazon API Gateway (REST API) as the front door.
AWS Lambda functions (Node.js) to process business logic.
Amazon Aurora Serverless v2 (PostgreSQL) for resilient storage.
Amazon RDS Proxy to manage database connections.
AWS Secrets Manager to handle credentials securely.

Let’s break down the CDK decisions that elevate this from a weekend project to an enterprise architecture.

Decision 1: Modularity through L3 Domain Constructs

The Enterprise Problem: If network engineers, database administrators, and application developers all commit to the same backend-stack.ts file, you will suffer from merge conflicts, accidental blast-radius damage, and slow deployments.

The Architectural Solution: We must decouple our infrastructure into Level 3 (L3) Domain Constructs. Instead of one massive file, we define logical boundaries within a new constructs folder:

├── lib/
│   ├── backend-stack.ts    # Now acts only as the Orchestrator
│   └── constructs/         # Domain-Driven L3 Constructs
│       ├── api.ts          # API Gateway & Compute
│       ├── network.ts      # VPC & Routing
│       ├── secrets.ts      # Secrets Manager Integration
│       └── storage.ts      # Databases & Proxies

We define explicit TypeScript contracts (Interfaces) to pass dependencies between these domains. The Api construct doesn't need to know how the database was built; it only needs the Proxy Endpoint and the Secret to connect.

// lib/constructs/api.ts
export interface ApiProps {
  vpc: ec2.IVpc;
  databaseProxy: rds.DatabaseProxy;
  databaseSecret: secretsmanager.ISecret;
  authSecrets: secretsmanager.ISecret;
}

export class Api extends Construct {
  constructor(scope: Construct, id: string, props: ApiProps) {
    super(scope, id);
    // ... API Logic ...
  }
}

This inversion of control allows the Platform team to update the database configuration without ever touching the API construct code.

Decision 2: A Secure-by-Default Network Topology

The Enterprise Problem: Security cannot be an afterthought. Leaving a database in a public subnet or manually managing security group IP addresses is a critical audit failure.

The Architectural Solution: We enforce a strict, 3-tier VPC architecture and utilize IAM for all internal authentication.

Isolated Storage: The Aurora cluster is deployed exclusively into SubnetType.PRIVATE_ISOLATED. It has absolutely no route to the internet.
Private Compute: Lambda functions are deployed into SubnetType.PRIVATE_WITH_EGRESS so they can reach external APIs if needed, but cannot be invoked directly from the internet (only via API Gateway).
Connection Pooling & IAM Auth: We deploy an RDS Proxy. Instead of lambdas opening direct connections to the database using hardcoded passwords, they connect to the Proxy.

We codify this security by granting access using CDK's principle of least privilege methods:

// Inside the API Construct wiring

// 1. Allow Network Traffic from Lambda to Proxy
props.databaseProxy.connections.allowDefaultPortFrom(lambda);

// 2. Grant IAM permission to read the DB credentials
props.databaseSecret.grantRead(lambda);

Security teams can easily review these explicit grants rather than untangling complex Security Group rules.

Decision 3: "Convention over Configuration" for API Routing

The Enterprise Problem: Platform engineers become a bottleneck if application developers have to ask them to update IaC every time a new API endpoint (e.g., POST /v1/users) is created.

The Architectural Solution: We build dynamic provisioning into the CDK code. Instead of manually instantiating every NodejsFunction and LambdaIntegration, we program the CDK to read the application folder structure during synthesis.

Imagine a project structure like this:

├── src/
│   └── lambda/
│       └── api/
│           ├── v1/
│           │   ├── users/
│           │   │   ├── get.ts     # GET /v1/users
│           │   │   └── post.ts    # POST /v1/users
│           │   └── status/
│           │       └── get.ts     # GET /v1/status

The CDK Api construct can dynamically read this directory. It parses the file paths (v1/users) and the file names (get.ts), and automatically provisions the required Lambda function and maps it to the API Gateway.

This pattern massively accelerates Developer Velocity. Application developers can build and deploy new features using standard Node.js practices without ever needing to learn CDK or touch the infrastructure repository.

Decision 4: Infrastructure-Aware Database Migrations

The Enterprise Problem: You deployed the new API and the database, but the application crashes because the SQL tables haven't been created yet. Relying on manual scripts or separate CI/CD steps for database migrations leads to configuration drift and failed deployments.

The Architectural Solution: We integrate the schema migration (using tools like Drizzle or Prisma) directly into the CDK lifecycle using AWS Custom Resources.

We define a specific Lambda function (DatabaseMigrator) that holds our SQL schema files. We then use a custom-resources.Provider to trigger this Lambda during the CloudFormation deployment process.

// Inside the main Stack Orchestrator (lib/backend-stack.ts)

// 1. The Migration Trigger
const databaseMigrationTrigger = new cdk.CustomResource(this, "MigrationTrigger", {
    serviceToken: databaseMigratorProvider.serviceToken,
    properties: { forceUpdate: Date.now().toString() }, // Ensure it runs every deploy
});

// 2. The Dependency Lock
databaseMigrationTrigger.node.addDependency(storage.databaseCluster);

By enforcing the dependency (addDependency), we guarantee the database is fully available before the migration runs. The deployment becomes atomic: if the infrastructure deploys but the migration fails, CloudFormation can halt or roll back. This guarantees that your infrastructure state and your database schema state are always in perfect sync.

Decision 5: Secure Secrets Management

The Enterprise Problem: Developers frequently make the mistake of hardcoding API keys, JWT secrets, or third-party tokens as plain text environment variables in the CDK. When synthesized, these secrets become plainly visible in the generated CloudFormation template (cdk.out), presenting a massive security vulnerability.

The Architectural Solution: Never pass plaintext secrets into your CDK code. Instead, manually provision your secrets in AWS Secrets Manager (or use automated pipelines to create them), and then have your CDK code reference them by Name or ARN.

In our Secrets construct, we load an existing secret:

// lib/constructs/secrets.ts
import * as secretsmanager from "aws-cdk-lib/aws-secretsmanager";

export class Secrets extends Construct {
  public readonly authSecrets: secretsmanager.ISecret;

  constructor(scope: Construct, id: string) {
    super(scope, id);

    // We only reference the secret name, not the value!
    this.authSecrets = secretsmanager.Secret.fromSecretNameV2(
      this,
      "AuthSecrets",
      "Backend/AuthSecrets"
    );
  }
}

Instead of injecting the actual secret values into our Lambda environment variables, we pass the ARN (Amazon Resource Name) of the secret:

// Inside the API construct provisioning the Lambda
environment: {
  AUTH_SECRETS_ARN: props.authSecrets.secretArn,
},

Inside the Lambda function execution environment (at runtime), the application uses the AWS SDK to fetch the secret using the ARN. This guarantees that sensitive values are never logged, never stored in Git, and never exposed in the generated CloudFormation JSON files.

Conclusion

Building serverless applications on AWS is relatively straightforward, but scaling that process across an enterprise requires intent.

By abandoning monolithic stacks in favor of domain constructs, enforcing strict network topologies, automating developer workflows via dynamic routing, integrating custom resource migrations, and utilizing dynamic secret referencing, you transform CDK from a simple scripting tool into a robust, enterprise-grade platform engineering capability.

References

https://docs.aws.amazon.com/cdk/v2/guide/home.html

CloudWatch RUM vs. Ad blockers : How to fix possible missing telemetry

Jérôme GUYON — Thu, 30 Apr 2026 16:56:50 +0000

A few weeks ago, I was reviewing the Amazon CloudWatch RUM dashboard for a web application I maintain. Page views were suspiciously low. After some digging, I opened the browser's DevTools on my machine and there it was: uBlock Origin was quietly blocking every request to dataplane.rum.eu-west-1.amazonaws.com. Our real user monitoring was blind to a non-negligible portion of our actual traffic.

CloudWatch RUM is one of those AWS services that doesn't get the attention it deserves. But if you care about understanding how real users experience your application — page load times, JavaScript errors, HTTP failures, Web Vitals — it's genuinely valuable. Here's what the dashboard looks like out of the box:

The problem is that ad blockers treat its data plane endpoint the same way they treat any third-party tracking domain: a request flying off to dataplane.rum.*.amazonaws.com looks exactly like telemetry that users might want to block.

The architecture fix is simple: your CloudFront distribution already serves your frontend. Add one behavior — /rum/* — that proxies to the RUM data plane. On the client side, point the aws-rum-web SDK to https://yourdomain.com/rum/ instead of the default AWS endpoint. I use AWS CDK here, but the same works with CloudFormation, Terraform, or the console.

Step 1: Create the CloudWatch RUM app monitor and its Cognito identity pool. RUM needs a Cognito identity pool with unauthenticated access to authorize browsers to send telemetry.

import * as cognito from 'aws-cdk-lib/aws-cognito';
import * as iam from 'aws-cdk-lib/aws-iam';
import * as rum from 'aws-cdk-lib/aws-rum';

// Create an identity pool for RUM (unauthenticated access)
const rumIdentityPool = new cognito.CfnIdentityPool(this, 'RumIdentityPool', {
  allowUnauthenticatedIdentities: true,
});

// Create the IAM role for unauthenticated users
const guestRole = new iam.Role(this, 'RumGuestRole', {
  assumedBy: new iam.WebIdentityPrincipal(
    'cognito-identity.amazonaws.com',
    {
      StringEquals: {
        'cognito-identity.amazonaws.com:aud': rumIdentityPool.ref,
      },
      'ForAnyValue:StringLike': {
        'cognito-identity.amazonaws.com:amr': 'unauthenticated',
      },
    },
  ),
});

// Attach the identity pool to the role
new cognito.CfnIdentityPoolRoleAttachment(this, 'RumRoleAttachment', {
  identityPoolId: rumIdentityPool.ref,
  roles: { unauthenticated: guestRole.roleArn },
});

// Create the RUM app monitor
const rumAppMonitor = new rum.CfnAppMonitor(this, 'RumAppMonitor', {
  domain: 'myapp.example.com',
  name: 'myapp-rum',
  appMonitorConfiguration: {
    allowCookies: true,
    // Allow X-Ray tracing
    enableXRay: true,
    // Track 100% of sessions
    sessionSampleRate: 1.0,
    telemetries: ['performance', 'errors', 'http'],
    identityPoolId: rumIdentityPool.ref,
  },
});

// Grant the guest role permission to send RUM events
guestRole.addToPolicy(
  new iam.PolicyStatement({
    actions: ['rum:PutRumEvents'],
    resources: [
      `arn:aws:rum:${this.region}:${this.account}:appmonitor/${rumAppMonitor.ref}`,
    ],
  }),
);

Step 2: Add the /rum/* behavior to your CloudFront distribution. This is the key part. I create an additional behavior that forwards requests matching /rum/* to the RUM data plane origin.

import * as cf from 'aws-cdk-lib/aws-cloudfront';
import * as origins from 'aws-cdk-lib/aws-cloudfront-origins';

// Build the additional behaviors map
const additionalBehaviors: Record<string, cf.BehaviorOptions> = {};

// Proxy RUM traffic through CloudFront
additionalBehaviors['/rum/*'] = {
  origin: new origins.HttpOrigin(
    `dataplane.rum.${this.region}.amazonaws.com`
  ),
  viewerProtocolPolicy: cf.ViewerProtocolPolicy.HTTPS_ONLY,
  cachePolicy: cf.CachePolicy.CACHING_DISABLED,
  allowedMethods: cf.AllowedMethods.ALLOW_ALL,
  originRequestPolicy: cf.OriginRequestPolicy.ALL_VIEWER_EXCEPT_HOST_HEADER,
};

// Create the distribution (your existing one — just add the behavior)
const distribution = new cf.Distribution(this, 'Distribution', {
  defaultBehavior: {
    origin: origins.S3BucketOrigin.withOriginAccessControl(websiteBucket),
    viewerProtocolPolicy: cf.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
  },
  additionalBehaviors,
  domainNames: ['myapp.example.com'],
  certificate: myCertificate,
});

I choose ALL_VIEWER_EXCEPT_HOST_HEADER because the RUM data plane expects the Host header to match its own domain (dataplane.rum.eu-west-1.amazonaws.com), not yours. If you forward the original Host, the request will fail with a 403.

Step 3: Point the RUM web client to your proxied endpoint. Install the aws-rum-web package and configure the endpoint to use your domain instead of the default AWS URL.

# Install the RUM web client
npm install aws-rum-web

import { AwsRum } from 'aws-rum-web';

const rumClient = new AwsRum(
  'your-app-monitor-id',       // from the CfnAppMonitor
  '1.0.0',                     // your app version
  'eu-west-1',                 // region
  {
    sessionSampleRate: 1,
    identityPoolId: 'eu-west-1:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx',
    // This is the magic line — point to your own domain
    endpoint: 'https://myapp.example.com/rum/',
    telemetries: ['performance', 'errors', 'http'],
    allowCookies: true,
    enableXRay: true,
  },
);

Et voilà! The browser now sends RUM telemetry to https://myapp.example.com/rum/, which CloudFront proxies to the actual RUM data plane. Ad blockers see a first-party request and leave it alone.

Things to know

Ad blocker filter lists — Popular lists like EasyPrivacy and uBlock filters include patterns matching dataplane.rum.*.amazonaws.com and the RUM CDN script URL (client.rum.*.amazonaws.com). By proxying through your own domain, you bypass both. If you use the NPM installation method (recommended), the script itself is bundled in your app — only the data plane calls need proxying.
Pricing — $1 per 100,000 RUM events. A typical visit generates ~20 events. For 500K monthly visits: ~$100/month. CloudFront proxy overhead is negligible.
Session sample rate — In production, consider setting sessionSampleRate to something lower than 1 (e.g., 0.1 for 10% sampling) to control costs while still getting statistically meaningful data.
X-Ray integration — With enableXRay: true, RUM traces connect to your backend X-Ray traces, giving you end-to-end visibility from the browser click to the database query.

CloudWatch RUM is one of those "set it and forget it" services that quietly delivers real value — but only if it actually receives data. If you're already using it, proxy it through your own domain or you're likely missing a significant chunk of your user base. And if you're not using it yet, I'd strongly suggest you have a look — understanding how real users experience your app is worth the small setup effort.

— Jerome

Cost-Efficient Serverless Workflows with Express Step Functions

Matt Morgan — Thu, 30 Apr 2026 12:01:42 +0000

Lambda and API Gateway are the bread and butter of the AWS serverless ecosystem. Lambda offers a compelling programming model of inputs and outputs. Lambda's name is taken from the concept of simple anonymous functions and implies simplicity and ease of use. Lambda delivers on this promise beautifully when our requirements are simple: "enqueue this message" or "fetch the item with this key from the database".

In the real world, our requirements aren't always so simple. Sometimes we must do multiple things in the scope of a synchronous request. Sometimes branching logic is necessary. When we drift from the "just a function" model of programming in Lambda, we can start to see challenges with cost, performance, and observability.

Synchronous Express Workflows for AWS Step Functions was announced back in 2020. I was instantly intrigued by the solution, but it wasn't until last year that I had a chance to really try them at scale. Based on my experience over the past year, this is a great way to build Well-Architected microservices.

This article includes source code for a sample project that demonstrates how we can use Express Workflows to create performant and economical microservices.

Let's consider a workflow for receiving orders on an e-commerce website. We are going to receive a web request and then do the following:

Transform and validate the incoming request
Validate the order (validate products, compute totals, etc.)
Reserve each inventory item
Process the payment with the chosen processor (Stripe, PayPal, or Apple Pay in the example)
Save the order to the database
Kick off post-order processing (notification, logging, metrics)
Send a response

We also need to handle errors, guarantee consistency, and respond promptly. Here is how that looks in the Step Functions console.

We've gained some efficiency here by running a couple of steps in parallel, and we've also handled a variety of error states.

If you are new to Step Functions, I recommend building in the console. Step Functions Workflow Studio gives you a drag-and-drop interface and the ability to export the result of your work to an IaC solution.

Project Layout

Our sample project manages infrastructure with AWS CDK. Starting with a basic CDK project, we build out some functions and our stack like this:

serverless-order-processor/
├── bin/
│   └── app.ts                          # CDK app entry
├── lib/
│   ├── order-processor-stack.ts        # CDK stack (infra + state machine)
│   └── order-workflow.ts               # Step Functions definition (CDK constructs)
├── functions/
│   ├── validate-order.ts               # Check products exist, prices match
│   ├── reserve-inventory.ts            # Reserve single item (used by Map state)
│   ├── release-inventory.ts            # Compensation: undo all reservations
│   ├── process-payment.ts              # Route to processor by config
│   ├── save-order.ts                   # Write final order to DynamoDB
│   ├── get-order.ts                    # GET /orders/{id}
│   └── list-products.ts                # GET /products
├── scripts/
│   └── seed.ts                         # Seed products + inventory
├── cdk.json
├── package.json
├── tsconfig.json
└── README.md

Our stack creates a DynamoDB table to store our products, inventory, and orders. It bundles and provisions our Lambda functions. It describes our state machine, synthesizes an ASL (Amazon States Language) definition, binds the Lambda functions to the state machine, and binds the state machine to an API Gateway that we use to synchronously invoke the workflow.

Workflow Construct

There are different patterns for writing CDK code. I prefer to create L3 constructs to contain complex business patterns. That looks like this:

export class OrderWorkflow extends Construct {
  public readonly stateMachine: StateMachine;

  constructor(scope: Construct, id: string, props: OrderWorkflowProps) {
    super(scope, id);

    // steps

    const definition = transformRequest.next(validateRequiredFields);

    // --- State Machine ---
    this.stateMachine = new StateMachine(this, "OrderStateMachine", {
      definitionBody: DefinitionBody.fromChainable(definition),
      stateMachineName: "order-workflow",
      stateMachineType: StateMachineType.EXPRESS,
      // more props
    });
  }
}

This keeps my main stack much cleaner than putting all that code inline.

    // --- Step Functions workflow ---
    const workflow = new OrderWorkflow(this, "OrderWorkflow", {
      validateOrderFn,
      reserveInventoryFn,
      releaseInventoryFn,
      processPaymentFn,
      saveOrderFn,
    });

Most steps are implemented with Lambda functions, though error-handling and data transformations can be handled with Pass states and JSONata. For more on JSONata, check out Create Stateful Serverless Workflows with AWS Step Functions and JSONata.

Each step needs a catch block that handles specific errors that step may throw. We need to decide whether to retry the step or fail, based on the error it threw. If the step indicates we can't proceed with the order due to insufficient inventory, it doesn't make sense to retry. But if a step fails due to a service error, throttling, or a partner's technical issues, a retry may be very helpful.

API Gateway Integration

API Gateway provides a direct integration pattern that invokes our state machine. This is simplified and abstracted using the CDK construct StepFunctionsIntegration. This construct is useful, but I prefer to modify it slightly. Let's look at the sample project code:

    const sfnIntegration = StepFunctionsIntegration.startExecution(
      workflow.stateMachine,
      {
        integrationResponses,
        requestTemplates: {
          "application/json": requestTemplate,
        },
        useDefaultMethodResponses: false,
      },
    );

    ordersResource.addMethod("POST", sfnIntegration, {
      methodResponses: [
        { statusCode: "200" },
        { statusCode: "400" },
        { statusCode: "500" },
      ],
    });

We create the integration and then bind it to the POST method of the orders resource. There are a couple of things I changed here to better suit our use case. I set useDefaultMethodResponses to false and supplied our own response templates. The reason is that the default response template returns a 500 if the state machine execution throws an error, or a 200 if it doesn't. I wanted to return a 400 for validation errors. To do this, we use a Velocity Template to detect an error key in the response and remap the response to 400 if it's present.

    const successResponseTemplate = [
      `#set($sfnOutput = $input.path('$.output'))`,
      `#if($sfnOutput.toString().contains('"status":"error"'))`,
      `#set($context.responseOverride.status = 400)`,
      `#end`,
      `$sfnOutput`,
    ].join("\n");

Named Executions

I also wanted to take advantage of the ability to name an execution, so I provided a custom request template.

    const requestTemplate = [
      `#set($customerId = $util.parseJson($input.body).get('customerId'))`,
      `{`,
      `  "input": "$util.escapeJavaScript($input.body).replaceAll("\\\\'", "'")",`,
      `  "name": "$util.escapeJavaScript($customerId)",`,
      `  "stateMachineArn": "${workflow.stateMachine.stateMachineArn}"`,
      `}`,
    ].join("\n");

This is a modified and simplified version of the request template that ships with AWS CDK. If you plan to do something like this, I suggest going back to the source to make sure you don't miss anything.

The way the named execution works is that we know any valid request will include a customerId, so we pull it from the JSON and set it to the name attribute in the request payload. Step Functions automatically appends a hash, so we don't need to worry about uniqueness. As a result, we can easily find our customer transactions in the Step Functions console!

Observability

It's essentially a must to enable CloudWatch logging. The named execution lets us find an exact execution that may have gone awry or be of interest. Then we can see exactly what happened, inspect logs, and improve our flow. The state machine execution will show JSONata variables at every step, as well as all inputs and outputs for each step. It's hard to imagine getting this level of fidelity in a trace without using Step Functions.

Cost

We need to consider trade-offs. Sure, this might help manage complexity, but doesn't it cost more? What about adding latency with state transitions or cold starts?

Let's start with cost. These services are very, very cheap if used correctly. It's often observed that the most expensive part of a serverless stack is the logging, and I can attest to that. These prices are in USD and for us-east-1

API Gateway Rest API charges $3.50 for 1 million requests.
1 million Express workflow executions with an average duration of 3 seconds and 64MB memory bills slightly higher at $4.13
5 million Lambda executions averaging 500ms duration with 128 MB is just $5.17 (excluding free tier).

The total for this part of the stack is $12.80/month (plus charges for the database and other services, which are beyond our scope here). That is an incredible price for 1 million executions. Cost scaling is mostly linear. If we scale this to 10 million requests, our bill is $127.93. We start to see better pricing tiers as we move to 100 million requests, with a monthly bill of $ 1,150.05. 100 million requests would indicate an average of 38 checkout conversions every second for the entire month, quite a brisk business! I'm not kidding that logging will be the big expense at that volume. I'm not attempting the math here because it's highly dependent on your use case, but suffice it to say you'll want to keep an eye on it, make sure you're only logging what is necessary, and set reasonable data retention on your log groups.

Performance

Now we've demonstrated that this architecture is great for managing complexity and that it's cost-effective. What about performance? Doesn't it stand to reason that passing state between multiple functions would be slower than encapsulating all the logic within a single function?

Not necessarily. First, Express Step Functions are extremely fast. In my testing, I'm through the first pass step and into my ValidateOrder Lambda function within milliseconds. Small functions with minimal dependencies will load and execute very quickly.

Parallel and Map States

The real value prop here is the ability to execute multiple functions in parallel. Imagine I'm checking out a cart with four different items. Most implementations will reserve the inventory sequentially. We could use Node.js or Go to wait on multiple requests at once, but there are downsides to doing that in a single function. We might need extra memory in anticipation of a large order. We have to add logic to handle the case where the order can only be partially fulfilled, which now mixes concerns. Our Express Workflow can run the same simple function in a Map state, then handle the combined results. We can even limit downstream impacts by setting a max concurrency limit on the map state, so a very large order doesn't attempt to adjust inventory for 100+ items in parallel.

What about executing unlike things in parallel? Express workflows can handle that as well. Zooming in here, we can see that we're persisting the order to the database while we send the confirmation and update our metrics. Most order-processing systems will handle those sequentially.

Cold Starts

What about all the cold starts? Won't the extra functions cause extra startup latency? Well, first of all, read this. If that doesn't convince you, my experience working in serverless for the better part of a decade now is that I don't worry about them at all. Yes, sometimes a function will start, costing you 200ms. At scale, this isn't much, because that function container may be invoked tens or even hundreds of thousands of times during its lifecycle, and that 200ms tax is paid only once across all those invocations.

Scaling and Service Limits

Step Functions

A casual reading of the docs suggests that express workflows may begin to throttle at 6000 RPS. This isn't the case - or rather, this is only the case for asynchronous invocations. For synchronous invocations, there's no rate limit at all! This is very rare for an AWS service, and it comes with an advisory.

Synchronous Express execution API calls don't contribute to existing account capacity limits. Step Functions provides capacity on demand and automatically scales with sustained workload. Surges in workload may be throttled until capacity is available.
If you experience throttling, try again after some time. For information about Synchronous Express workflows, see Synchronous and Asynchronous Express Workflows in Step Functions.

I have never seen a synchronous execution throttle, but I have seen services backed by Express Workflows scale very quickly. If you plan to operate this service on a very large scale, it's a good idea to speak with your account team.

API Gateway

With no hard limit on Step Functions executions, we need to look at API Gateway quotas. API Gateway can handle 10,000 RPS (sustained) per region. This limit can be increased and applies at the account and region levels.

Lambda

Lambda has a maximum concurrency rate of 1,000 concurrent executions at the account level. This can also be increased. This is your most likely source of throttles in this kind of architecture. Since we're invoking multiple functions per workflow, sometimes in parallel, we can quickly reach that 1,000 limit. Fortunately, this limit can be adjusted in the AWS Console. It's a very good idea to project your concurrency needs and set your quota appropriately. If you set it too high, a bug could result in an expensive bill. If you set it too low, you may experience throttling.

It's also a good idea to set reserved concurrency on every Lambda function. This is different than provisioned concurrency (prepay to keep a function "warm"). Instead, reserved concurrency protects your function from a "noisy neighbor" eating up all of your concurrent executions.

Finally, you can handle throttles with retries in Step Functions. Even synchronous executions should implement retries in case of throttle or service failure. If your service is operating at extremely high throughput, you might throttle, wait a few milliseconds, then receive a successful invocation. This is a much better outcome than failing with a 429 error!

To see this in action, I created a simple load test script using k6. Here's a sample run from my laptop.


         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/


     execution: local
        script: scripts/load-test.js
        output: -

     scenarios: (100.00%) 1 scenario, 200 max VUs, 1m5s max duration (incl. graceful stop):
              * default: Up to 200 looping VUs for 35s over 4 stages (gracefulRampDown: 30s, gracefulStop: 30s)


━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 Load Test Summary
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Total requests:  9442
  Failure rate:    0.0%
  Latency avg:     419ms
  Latency med:     408ms
  Latency p90:     499ms
  Latency p95:     555ms
  Latency max:     3673ms
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

running (0m35.5s), 000/200 VUs, 9442 complete and 0 interrupted iterations
default ✓ [======================================] 000/200 VUs  35s

I'm not able to generate much more load than that on a MacBook Pro, but this does illustrate how easily this architecture handles traffic spikes. "Scalability" here is a bit of a misnomer, as my service isn't scaling up to handle the load. Instead, there is available capacity to meet my needs at all times!

More on Step Functions

If you found this article helpful, check out my other writing on Step Functions!

Cover by Glynis Morgan

Lambda Multi-tenanted Isolation

Gary Mclean — Thu, 30 Apr 2026 10:36:48 +0000

Introduction

In any application or system, we must have safeguards in place to prevent cross-customer data exposure. Our software is developed using a range of approaches, from human-written code to AI-assisted generation and regardless of how code is produced, the risk of unintended data exposure remains a critical concern.

Developers, Engineering Managers, and Security teams should be aware of potential data exposures and the additional controls which can be put in place as preventive measures.

Once data is exposed or lost, it cannot be undone. Consequently, a data breach may result in serious impacts, including financial loss and reputational harm.

Background

Ensuring data security is a fundamental requirement for applications, both internal and externally exposed. A traditional three-tier architecture is composed of a presentation layer (web tier), an application layer (application tier), and a database layer (database tier).

Any code execution within the application layer more often than not retrieves data from the database layer and returns it to the consumer. The database layer will be the most secure area, with data access tightly controlled and very limited in scope.

Understanding this architecture is important context before exploring how serverless compute and AWS Lambda specifically introduces a unique set of considerations around how application code is executed and how memory is managed between invocations.

Multi-tenant

A multi-tenant application is a software architecture commmonly used in Software as a Service (SaaS) where a single applcation instance services multiple customers while keeping each customers data logically isolated.

Multiple tenants use the same application layer and access to data is controlled using identifiable information obtained during authentication. A unique identifier such as such as a Company ID, Tenant ID or another identifier would be used to aid in only retrieving data in scope for that user.

Even though logically, data is isolated at source, the same application code in the same instance can be executed repeatedly. APIs generally do not restart or spin up independent environments as this would become expense. While using the same execution environment, generally the same same memory address space will be reused to store STATIC and variable data.

Challange

Many SaaS companies host their offerings across Cloud providers such as AWS, utilising serverless compute like as Lambda. There are many articles and documentation that deep dive into Lambda, though at a high level, Lambda is a service which allows code to run without the need to manage servers.

A Lambda execution environment lifecycle can be grouped into 3 phases

The Invoke phase is where the core business logic executes, code queries data from the database, performs actions against it and returns data to the consumer.
Post invoke phase, the Lambda execution environment may or may not shutdown, or remain running waiting for the next invocation.

When a Lambda function is initialised for the first time, its execution environment is fresh; variables are empty and no prior state exists. However, AWS Lambda reuses warm execution environments for subsequent invocations as a performance optimisation. This means that residual data from a previous invocation; such as values held in temporary variables or files written to the ephemeral file system (/tmp); may still be present when the next invocation begins. Without proper hygiene practices in place, this leftover data carries a significant risk of being inadvertently exposed to the next request or customer.

Multi-tenant function?

Lambda runs your code inside execution environments.
Small, secure Firecracker microVMs that handle an invocation and then sit warm, waiting for the next one. That's efficient until you realise those environments get reused across invocations. Your function serves a request from Tenant A, caches some config or credentials in memory, and then the next request comes in from Tenant B, potentially landing in the same environment, with access to whatever Tenant A left behind.

If your code is perfect, that's fine. In practice, it isn't. One oversight in your data handling and you have a cross-tenant data exposure incident.

Mitigation

It is worth noting that under certain conditions, residual data may never actually reach another invocation. Factors such as environment load, the rate at which execution environments are initialised and torn down, and whether Provisioned Concurrency is configured can all influence how long a warm environment persists. In high-churn scenarios where environments are frequently recycled, leftover data may be naturally cleared before it has the opportunity to be exposed. However, this should never be relied upon as a security control; it is an unpredictable side effect of infrastructure behaviour, not a guarantee.

Approaches

Variable initialisation or more specifically explicit initialisation

The first approach is to run a Lambda function per business process, such as an APi Resource which returns data per customer based of their identity supplied during authentication. Ensure code correctly cleanses the environment at the start of invocation where the practice of deliberately setting variables to a known, clean state before any logic executes, rather than assuming they are empty.

Per-tenant Lambda function 1-2-1

The highest degree of isolation would be to create a Lambda function per tenant. Each tenant would have its own dedicated function assigned exclusively to them for code execution. While this approach maximises data cleanliness, it is difficult to maintain at scale; API limits when updating many functions simultaneously, complex CI/CD pipelines, monitoring and alerting sprawl across a large number of Log Groups, and considerably longer deployment times all become significant operational burdens. For most organisations, the overhead of managing this model outweighs the isolation benefits it provides.

Lambda tenant isolation mode

Tenant isolation mode exists for a specific scenario: you're running a single Lambda function that serves multiple end-users or tenants, and you need hard guarantees that their execution environments never bleed into one another.

Two situations make this non-negotiable.
First, if your tenants execute their own code. Isolated environments limit the blast radius when that code misbehaves, whether through bugs or something more deliberate. Second, if you're processing sensitive, tenant-specific data. Shared environments create exposure risk; isolation removes it.

With tenant isolation mode enabled, you pass a tenant identifier with each function invocation. Lambda uses that identifier to route requests to underlying execution environments, ensuring that an environment associated with one tenant is never used to serve requests from another.

Limitations

Tenant isolation mode is not supported with functions that use function URLs, provisioned concurrency, or SnapStart. You can send requests to a tenant-isolated function using synchronous invocations, asynchronous invocations, or by using Amazon API Gateway as an event-trigger.

Bottom line

Tenant isolation mode eliminates the need for custom isolation logic or separate per-tenant functions, letting you focus on business logic while AWS handles the complexities of tenant-aware compute environment isolation. For SaaS builders running sensitive workloads or executing user-supplied code, that's a significant operational and security improvement, it was a long time coming.

Building a Serverless DynamoDB MCP: Making Your AI Talk to Your Database

Yeshwanth L M — Thu, 30 Apr 2026 10:16:53 +0000

Building a Serverless DynamoDB MCP: Making Your AI Talk to Your Database

Have you ever wished you could just ask your AI assistant to query your database? Something like:

"Hey Kiro, show me all active users from my DynamoDB table"

"Add a new user named Alice with email alice@example.com to the Users table"

Well, that's exactly what we're building today! 🚀

The Big Picture: What Are We Building?

We're creating a serverless MCP (Model Context Protocol) backend on AWS that enables AI assistants like Kiro to interact with DynamoDB tables conversationally. Think of it as giving Kiro a direct, secure phone line to your DynamoDB database.

Here's what makes this special:

10 DynamoDB operations exposed as natural language tools
Completely serverless - runs on AWS Lambda
Secure by default - AWS IAM authentication with SigV4 signing
Zero local dependencies - all the heavy lifting happens in the cloud
Self-configuring - tools are discovered dynamically

Wait, What's MCP?

Before we dive in, let's talk about MCP (Model Context Protocol).

Think of MCP as a standardized way for AI assistants to use external tools. It's like giving your AI a toolbox where each tool does something specific - query a database, fetch weather data, send emails, etc.

The protocol works like this:

AI assistant connects to an MCP server
Server tells AI what tools are available
AI can call these tools when needed
Server executes the tool and returns results
AI uses the results to help the user

The beauty? The AI doesn't need to know how the tools work internally. It just needs to know what they do and how to call them.

Why Build This Serverless?

You might ask: "Why not just run a local server on my machine?"

Great question! Here's why serverless wins:

1. Centralized Management

One deployment serves all your team members. Update once, everyone benefits. No "it works on my machine" problems.

2. Security at Scale

IAM-based authentication (no API keys to rotate)
Each Lambda has scoped permissions
Audit logs for every database operation
Secrets managed by AWS Secrets Manager

3. Cost Efficiency

Pay only when you use it. Lambda charges per request, not per hour. Most hobby projects? Practically free under AWS free tier.

4. Automatic Scaling

Whether it's you at 2 AM or your whole team during peak hours, it just works.

5. No Infrastructure Headaches

No servers to patch, no runtime versions to manage, no "why is Python 3.8 broken on my Mac?"

The Architecture: How It All Fits Together

Let me paint you a picture of how this works:

┌─────────────────────┐
│  You: "Show me all  │
│  users from Users   │
│  table"             │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Claude Desktop     │ ← Your AI assistant
│  (MCP Client)       │
└──────────┬──────────┘
           │ stdio / JSON-RPC
           ▼
┌─────────────────────┐
│  Local Proxy        │ ← Signs requests with your AWS credentials
│  (proxy.sh)         │
└──────────┬──────────┘
           │ HTTPS + AWS IAM Auth
           ▼
┌─────────────────────┐
│  API Gateway        │ ← Entry point to AWS
│  (HTTP API)         │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Lambda Functions   │ ← 11 functions, one per operation
│  - get-item         │
│  - put-item         │
│  - query            │
│  - scan             │
│  - etc...           │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  DynamoDB Tables    │ ← Your actual data
└─────────────────────┘

The Flow, Step by Step:

You ask Kiro something about your database
Kiro recognizes it needs to use a DynamoDB tool
Local proxy intercepts the request and signs it with AWS SigV4
API Gateway validates the signature (IAM authentication)
Lambda function executes the DynamoDB operation
Result comes back as human-readable text
Kiro uses the result to answer your question

The genius here? Kiro has no idea it's talking to AWS. It thinks it's using a local tool. All the cloud complexity is hidden.

The Key Design Decisions

Let me walk you through the "why" behind each major decision:

Decision 1: Why Plain-Text Responses?

DynamoDB returns data in this format:

{
  "Item": {
    "userId": {"S": "user001"},
    "name": {"S": "Alice Johnson"},
    "age": {"N": "28"}
  }
}

Ugly, right? Those {"S": ...} and {"N": ...} wrappers are DynamoDB's type system.

Our Lambda functions convert this to:

Item from table 'Users':
  userId: user001
  name: Alice Johnson
  age: 28

Why? Because Kiro can narrate this naturally to you. No JSON parsing needed. It's optimized for conversation, not computation.

Decision 2: Why One Lambda Per Operation?

We could've built one mega-Lambda that handles everything. But we didn't. Here's why:

Principle of Least Privilege: Each Lambda gets only the permissions it needs.

get-item Lambda → dynamodb:GetItem permission only
put-item Lambda → dynamodb:PutItem permission only
delete-item Lambda → dynamodb:DeleteItem permission only

If one Lambda gets compromised? Damage is limited.

Clear Separation:

Each Terraform file = One Lambda
Easy to understand, easy to modify
Want to remove scan operation? Delete one file.

Cost Optimization:
Lambda charges by execution time. Smaller functions = faster cold starts = lower costs.

Decision 3: Why Self-Configuring Tools?

The proxy script doesn't have any hardcoded tool definitions. On startup, it calls:

GET /tools

And receives:

[
  {
    "name": "dynamodb_get_item",
    "description": "Retrieve a single item from DynamoDB...",
    "inputSchema": {...},
    "route": "/dynamodb/get-item"
  },
  ...
]

The magic? Add a new tool to dynamodb_ops.py, deploy, and the proxy automatically discovers it. No client-side updates needed.

This follows the Unix philosophy: "mechanism, not policy." The proxy provides the mechanism (SigV4 signing, JSON-RPC), but the backend defines the policy (what tools exist).

Decision 4: Why AWS IAM Instead of API Keys?

Traditional approach:

export API_KEY="super-secret-key-123"

Our approach:

# Uses your AWS credentials
# Same ones you use for AWS CLI

Benefits:

✅ No keys to rotate every 90 days
✅ Integrates with your existing AWS setup
✅ CloudTrail logs every request
✅ Can revoke access instantly via IAM
✅ Supports MFA, temporary credentials, SSO

The proxy signs every request with AWS Signature Version 4. API Gateway validates the signature before Lambda even runs. It's the same security AWS Console uses.

The Code: Let's Break It Down

The Lambda Handler (Simplified)

Here's what a Lambda function looks like (simplified for clarity):

def get_item_handler(event, context):
    """Retrieve a single item from DynamoDB by primary key."""

    # Parse the request
    params = json.loads(event.get("body", "{}"))
    table_name = params.get("table_name")
    key = params.get("key")

    # Convert simple format to DynamoDB format
    dynamodb_key = {}
    for k, v in key.items():
        if isinstance(v, str):
            dynamodb_key[k] = {"S": v}
        elif isinstance(v, (int, float)):
            dynamodb_key[k] = {"N": str(v)}

    # Call DynamoDB
    response = dynamodb.get_item(
        TableName=table_name,
        Key=dynamodb_key
    )

    # Format response as human-readable text
    item = response.get("Item", {})
    formatted = format_item(item)

    return {
        "statusCode": 200,
        "headers": {"Content-Type": "text/plain"},
        "body": f"Item from table '{table_name}':\n{formatted}"
    }

Three key parts:

Parse input - Extract table name and key
Convert formats - Simple JSON → DynamoDB types
Return readable text - Not raw JSON

The Proxy Script (The Secret Sauce)

The proxy does three critical things:

1. Tool Discovery:

# On startup
curl -X GET https://api.execute-api.us-east-1.amazonaws.com/tools
# Saves tool definitions locally

2. SigV4 Signing:

# For each request
signature=$(calculate_aws_signature "$request")
curl -H "Authorization: AWS4-HMAC-SHA256 Credential=..." \
     https://api.execute-api.us-east-1.amazonaws.com/dynamodb/get-item

3. JSON-RPC Translation:

# Receives from Kiro:
{"jsonrpc": "2.0", "method": "tools/call", "params": {...}}

# Translates to HTTP:
POST /dynamodb/get-item
{"table_name": "Users", "key": {"userId": "123"}}

# Returns to Kiro:
{"jsonrpc": "2.0", "result": {"content": [{"type": "text", "text": "..."}]}}

It's a protocol adapter - speaks MCP to Kiro, speaks HTTP to AWS.

The Infrastructure (Terraform)

Each Lambda gets its own Terraform file. Here's the pattern:

# IAM Role
resource "aws_iam_role" "lambda_get_item_role" {
  name = "dynamodb-get-item-role"
  # Trust policy allows Lambda service to assume this role
}

# Scoped Permission
resource "aws_iam_role_policy" "lambda_get_item_dynamodb" {
  name = "dynamodb-get-item-policy"
  role = aws_iam_role.lambda_get_item_role.id

  policy = jsonencode({
    Statement = [{
      Effect   = "Allow"
      Action   = ["dynamodb:GetItem"]  # Only this action!
      Resource = ["*"]
    }]
  })
}

# Lambda Function
resource "aws_lambda_function" "lambda_get_item" {
  function_name = "dynamodb-get-item"
  role          = aws_iam_role.lambda_get_item_role.arn
  runtime       = "python3.13"
  handler       = "dynamodb_ops.get_item_handler"
  # ... more config
}

Rinse and repeat for each operation. Total: 11 Lambda functions.

The 10 DynamoDB Operations

Here's what you can do:

Read Operations

1. Get Item - Fetch a single item by key

"Get user user001 from the Users table"

2. Query - Find items matching a condition

"Show me all orders for user123 from the Orders table"

3. Scan - Read the entire table (with optional filters)

"Scan the Products table and show me 10 items"

4. Batch Get - Fetch multiple items at once

"Get users user001, user002, and user003 from Users table"

5. List Tables - See all DynamoDB tables

"What DynamoDB tables do I have?"

6. Describe Table - Get table metadata

"Describe the Users table structure"

7. Count Items - Get approximate table size

"How many items are in the Users table?"

Write Operations

8. Put Item - Add or replace an item

"Add a user with userId user011, name Kate Brown to Users table"

9. Update Item - Modify specific attributes

"Update the role to Senior Engineer for user001"

10. Delete Item - Remove an item

"Delete user user005 from the Users table"

Bonus: The Sample Table

We include an optional sample-table.tf that creates a "Users" table with 10 realistic user records:

resource "aws_dynamodb_table" "users_sample" {
  name         = "Users"
  billing_mode = "PAY_PER_REQUEST"  # No fixed costs!
  hash_key     = "userId"

  # ... schema definition
}

resource "aws_dynamodb_table_item" "user_1" {
  table_name = aws_dynamodb_table.users_sample.name

  item = jsonencode({
    userId     = { S = "user001" }
    name       = { S = "Alice Johnson" }
    email      = { S = "alice.johnson@example.com" }
    role       = { S = "Software Engineer" }
    department = { S = "Engineering" }
    active     = { BOOL = true }
    # ... more fields
  })
}

Perfect for testing! Deploy once, start asking questions immediately.

Don't need it? Just delete the file or rename it to sample-table.tf.disabled.

How to Deploy This

Ready to try it? Here's the journey:

Prerequisites

# You need these installed
aws --version          # AWS CLI
terraform --version    # Terraform
jq --version          # JSON processor
bash --version        # Bash 4+

Make sure your AWS credentials are configured:

aws sts get-caller-identity

Step 1: Clone and Deploy

# Clone the repo (or create from the code)
cd AWSServerlessMCP

# Run the magic script
./apply.sh

This script:

✅ Validates your environment
✅ Deploys all 11 Lambda functions via Terraform
✅ Creates API Gateway routes
✅ Generates IAM user for the proxy
✅ Stores credentials in Secrets Manager
✅ Generates Claude Desktop config
✅ Runs validation tests

Total deployment time: ~2-3 minutes

Step 2: Configure Claude Desktop

The script generates 02-proxy/claude_desktop_config_sh.json:

{
  "mcpServers": {
    "dynamodb": {
      "command": "bash",
      "args": ["/path/to/proxy.sh"],
      "env": {
        "MCP_ACCESS_KEY_ID": "AKIA...",
        "MCP_SECRET_ACCESS_KEY": "...",
        "MCP_API_ENDPOINT": "https://....execute-api.us-east-1.amazonaws.com",
        "MCP_REGION": "us-east-1"
      }
    }
  }
}

Copy this to your Claude Desktop config:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

Step 3: Restart Claude Desktop

Close and reopen Claude Desktop. You should see DynamoDB tools appear!

Step 4: Start Asking Questions!

Try these:

"List all my DynamoDB tables"

"Describe the Users table"

"Show me all users from the Users table"

"Get user user001 from Users table"

"Add a new user with userId user011, name John Doe, 
 email john@example.com to the Users table"

Security Deep Dive

Let's talk about how we keep this secure:

1. IAM Authentication

Every request goes through this flow:

Request → Proxy signs with AWS SigV4 → API Gateway validates signature → Lambda executes

No signature = No access. Period.

2. Scoped Permissions

The proxy IAM user has exactly ONE permission:

{
  "Effect": "Allow",
  "Action": "execute-api:Invoke",
  "Resource": "arn:aws:execute-api:us-east-1:ACCOUNT:API_ID/*/*"
}

It can call the API. Nothing else. Can't create EC2 instances, can't delete S3 buckets, can't read secrets.

3. Lambda Isolation

Each Lambda has scoped DynamoDB permissions:

get-item Lambda    → Can only read
put-item Lambda    → Can only write
delete-item Lambda → Can only delete

Even if you somehow bypass API Gateway (you can't), each Lambda is isolated.

4. Audit Trail

Every action is logged:

def _audit_log(event: dict, tool: str) -> None:
    user = event.get("headers", {}).get("x-mcp-user", "unknown")
    print(f"AUDIT tool={tool} user={user}")

CloudWatch Logs capture:

Who made the request (your username)
What tool was called
When it happened
What the result was

5. No Secrets in Code

Credentials live in AWS Secrets Manager:

aws secretsmanager get-secret-value --secret-id dynamodb-mcp-proxy

Never in your codebase. Never in environment variables you might accidentally commit.

Cost Analysis

"How much does this cost to run?"

Let's break it down:

AWS Free Tier (First 12 Months):

Lambda: 1M requests/month free + 400,000 GB-seconds compute
API Gateway: 1M API calls/month free
DynamoDB: 25 GB storage + 25 read/write units

After Free Tier:

Lambda: $0.20 per 1M requests + $0.0000166667 per GB-second

Example calculation for 10,000 queries/month:

Requests: 10,000 × $0.20/1M = $0.002
Compute (128MB, 200ms avg): 10,000 × 0.2s × 0.125GB × $0.0000166667 = $0.004
Total Lambda: ~$0.01/month

API Gateway: $1.00 per 1M requests

10,000 requests = $0.01/month

DynamoDB: Pay-per-request pricing

$1.25 per 1M write requests
$0.25 per 1M read requests
10,000 reads = $0.003/month

Secrets Manager: $0.40/month per secret

$0.40/month

Total for 10,000 queries/month: ~$0.42

For a hobby project? Basically free. For production? Scales linearly with usage.

Common Patterns and Best Practices

Pattern 1: Query with Filters

Instead of scanning, use query when possible:

# Efficient - uses partition key
"Query Orders table where userId equals user123"

# Less efficient - full table scan
"Scan Orders table and filter by userId user123"

Pattern 2: Batch Operations

Fetch multiple items in one call:

# One request for three items
"Get users user001, user002, user003 using batch get"

# Better than three separate requests

Pattern 3: Conditional Updates

Use update expressions for atomic operations:

"Update the counter by incrementing it by 1 for item user001"

This translates to:

UpdateExpression="SET #counter = #counter + :inc"

Atomic, no race conditions.

Extending the System

Want to add a new operation? Here's how:

1. Add Handler to Python

# In dynamodb_ops.py

def batch_write_handler(event, context):
    """Bulk write multiple items."""
    params = _parse_json_body(event)
    # ... implementation
    return _response(200, "Successfully wrote N items")

# Add to TOOL_REGISTRY
TOOL_REGISTRY.append({
    "name": "dynamodb_batch_write",
    "description": "Write multiple items in one request",
    "inputSchema": {...},
    "route": "/dynamodb/batch-write"
})

2. Create Terraform File

# lambda-batch-write.tf

resource "aws_iam_role" "lambda_batch_write_role" {
  name = "dynamodb-batch-write-role"
  # ... role definition
}

resource "aws_iam_role_policy" "lambda_batch_write_dynamodb" {
  policy = jsonencode({
    Statement = [{
      Effect   = "Allow"
      Action   = ["dynamodb:BatchWriteItem"]
      Resource = ["*"]
    }]
  })
}

resource "aws_lambda_function" "lambda_batch_write" {
  function_name = "dynamodb-batch-write"
  handler       = "dynamodb_ops.batch_write_handler"
  # ... function config
}

3. Update API Gateway

# In api.tf

resource "aws_apigatewayv2_integration" "batch_write_integration" {
  api_id          = aws_apigatewayv2_api.dynamodb_api.id
  integration_uri = aws_lambda_function.lambda_batch_write.invoke_arn
  # ... integration config
}

resource "aws_apigatewayv2_route" "batch_write_route" {
  api_id    = aws_apigatewayv2_api.dynamodb_api.id
  route_key = "POST /dynamodb/batch-write"
  target    = "integrations/${aws_apigatewayv2_integration.batch_write_integration.id}"
}

4. Deploy

./apply.sh

That's it! The proxy auto-discovers the new tool on next startup.

Real-World Use Cases

Where does this shine?

1. Data Exploration

"Show me all users who joined in 2023"
"How many active subscriptions do we have?"
"What's the average age of users in the Engineering department?"

Natural language beats writing DynamoDB queries.

2. Quick CRUD Operations

"Add a test user for QA testing"
"Update the status to active for order order123"
"Delete all test data with prefix test-"

No need to open AWS Console.

3. Database Migrations

"Scan the Users table and show me all items missing the email field"
"Update all users in the Premium tier to add a credits field with value 100"

Kiro can help you identify and fix data inconsistencies.

4. Monitoring and Alerts

"How many failed login attempts in the last hour?"
"Show me all orders with status pending older than 24 hours"

Quick operational queries without building dashboards.

5. Developer Productivity

"Create a sample order for testing the checkout flow"
"Copy user user001 to user001-backup"
"Show me the schema of the Products table"

Faster than clicking through the console.

Lessons Learned

Building this taught me some valuable lessons:

1. Start with Security

We didn't bolt on IAM later - it was there from day one. That made all subsequent decisions easier.

2. Simplicity Scales

One Python file. Simple Terraform. No fancy frameworks. Yet it handles thousands of requests/day without breaking a sweat.

3. Developer Experience Matters

The fact that you can ask questions in plain English? That's not a gimmick. It genuinely changes how you interact with your data.

4. Observability is Free (Almost)

CloudWatch Logs, CloudTrail, X-Ray tracing - all built into Lambda. We didn't build a monitoring system; we just used what AWS gives us.

5. The Proxy Pattern Works

Keeping the proxy thin and stateless was the right call. All complexity lives in Lambda where we can update it independently.

Troubleshooting Tips

Hit a snag? Here's how to debug:

Problem: Proxy won't connect

# Check AWS credentials
aws sts get-caller-identity

# Test API Gateway directly
aws lambda invoke \
  --function-name dynamodb-list-tables \
  --payload '{}' \
  /tmp/out.json

Problem: Permission denied

Check IAM user has execute-api permission:

aws iam get-user-policy \
  --user-name dynamodb-mcp-proxy \
  --policy-name dynamodb-mcp-proxy-invoke

Problem: Lambda timeout

Increase timeout in Terraform:

resource "aws_lambda_function" "lambda_scan" {
  timeout = 30  # Increase from 15 to 30 seconds
}

Problem: Can't find table

Verify table exists:

aws dynamodb list-tables

Check Lambda has permission to access it.

Future Enhancements

Where could this go?

1. Multi-Region Support

Deploy to multiple regions, let Kiro route to the nearest one:

module "dynamodb_mcp_us_east" {
  source = "./modules/dynamodb-mcp"
  region = "us-east-1"
}

module "dynamodb_mcp_eu_west" {
  source = "./modules/dynamodb-mcp"
  region = "eu-west-1"
}

2. Advanced Query Support

Add support for complex queries:

"Find all users where age > 25 AND department = Engineering 
 AND active = true, sorted by joinDate"

3. Transaction Support

DynamoDB supports transactions:

def transaction_handler(event, context):
    """Execute multiple operations atomically."""
    dynamodb.transact_write_items(
        TransactItems=[
            {"Put": {...}},
            {"Update": {...}},
            {"Delete": {...}}
        ]
    )

4. Stream Processing

React to DynamoDB changes:

"Alert me when a new order is created"
"Update the analytics table whenever a user signs up"

Use DynamoDB Streams + Lambda triggers.

5. Cost Optimization

Add DynamoDB reserved capacity for predictable workloads:

resource "aws_dynamodb_table" "users" {
  billing_mode = "PROVISIONED"
  read_capacity  = 5
  write_capacity = 5
}

6. Multi-Table Operations

"Join Users table with Orders table on userId 
 and show me total order value per user"

Execute multiple queries and aggregate in Lambda.

Comparison with Alternatives

How does this stack up?

vs. Local MCP Server

Local Server:

✅ Lower latency
✅ No AWS costs
❌ Runs only on your machine
❌ Need to manage runtime dependencies
❌ No centralized updates

Serverless (Ours):

✅ Works for your whole team
✅ No runtime to manage
✅ Built-in scaling
✅ AWS-level security
❌ Small latency overhead (~100-200ms)

vs. Direct DynamoDB Access

Direct Access (boto3):

✅ Maximum control
✅ Lowest latency
❌ Requires coding for every query
❌ No natural language interface
❌ Harder to audit

MCP (Ours):

✅ Natural language queries
✅ Audit trail built-in
✅ Non-technical users can query
❌ Limited to predefined operations

vs. AWS Data API

AWS Data API:

Only for Aurora Serverless
HTTP-based queries
SQL interface

Ours:

✅ Works with DynamoDB
✅ NoSQL operations
✅ Natural language interface
✅ MCP integration

Key Takeaways

If you remember nothing else, remember this:

MCP is powerful - It's not hype. It genuinely changes how we interact with data.
Serverless fits MCP perfectly - Centralized, scalable, secure. All the things MCP needs.
Security first, always - IAM, scoped permissions, audit logs. Build it in from day one.
Plain-text responses win - Optimize for conversation, not computation.
Keep it simple - One Python file, clear Terraform, no magic. Simplicity scales.
The proxy pattern works - Thin client, fat backend. Update independently.

Try It Yourself!

Ready to build your own? Here's the complete source:

GitHub: [Link to your repo]

Deploy in 3 commands:

git clone [your-repo]
cd AWSServerlessMCP
./apply.sh

Questions? Hit me up in the comments! I'd love to hear:

What other AWS services would you want MCP tools for?
What improvements would you make?
What challenges did you face deploying it?

Wrapping Up

We started with a simple question: "Can I ask Kiro to query my database?"

We ended with:

✅ A production-ready serverless MCP backend
✅ 10 DynamoDB operations as natural language tools
✅ Secure, scalable, and cost-effective
✅ Deployable in under 5 minutes

This is just the beginning. MCP is going to change how we build AI-powered tools. The future isn't about building smarter AI - it's about giving AI better tools.

What will you build with MCP?

Found this helpful? Give it a ❤️ and follow for more serverless + AI content!

Have questions or improvements? Drop them in the comments - I read every one!

Connect with me:

GitHub: [https://github.com/yeshwanthlm]
LinkedIn: [https://www.linkedin.com/in/yeshwanth-l-m/]
YouTube: [https://www.youtube.com/@TechWithYeshwanth]

Tags: #aws #serverless #lambda #dynamodb #ai #claude #mcp #terraform #python #devops

Enable HTTPS with CloudFront for an S3 Static Website

Esther Ninyo — Wed, 29 Apr 2026 20:53:27 +0000

Amazon CloudFront accelerates the delivery of static and dynamic web content to end users. To read more on what CloudFront does, check the official page here.

In this article, we will enable HTTPS on a static website hosted on Amazon S3.
Note: If a website is hosted on Amazon S3 static hosting, the default S3 website endpoint only supports HTTP.

Prerequisites:

Amazon account
Static website already hosted on S3 (Use this link to access my previous article on how to create a static website on Amazon S3)

Steps to follow:

Open the AWS console and log in.
Search for CloudFront in the search bar and click on create distribution.

The free tier is sufficient for this learning purpose.

Name your distribution and click next.

Use the static website endpoint under origin.

Review and create distribution.

Voila! Your CloudFront distribution is ready for use.

You can access your secured website using the CloudFront distribution domain name.

You can also take this further by adding a custom domain.

Thank you for reading to the end. Kindly reach out to me in the comment section if you have any questions, or on LinkedIn.

Till next time, cheers.

Demystifying the AWS Advanced JDBC Driver: Pools, Plugins, and the Traps I Hit

Kris Iyer — Wed, 29 Apr 2026 19:17:33 +0000

Demystifying the AWS Advanced JDBC Driver: Pools, Plugins, and the Traps I Hit

Date: 2026-04-29
Status: Published

TL;DR

The AWS Advanced JDBC Driver wraps your database driver with a plugin chain that handles failover, read/write splitting, and connection monitoring. The critical gotcha: it can create internal connection pools separate from your application's HikariCP. If you're on v2.x with the F0 profile, you're hitting a hardcoded 30-connection ceiling regardless of your external pool config. The fix: upgrade to v3.x and use connectionPoolType=hikari with cp-MaximumPoolSize properties, or drop profiles entirely and configure plugins manually.

Key invariant: cp-MaximumPoolSize >= external maximumPoolSize to avoid the internal pool becoming your bottleneck.

Quick wins:

Check your driver version: v3.3.0+ recommended
If using F0 profile on v2.x, upgrade immediately
Set exception-override-class-name: software.amazon.jdbc.util.HikariCPSQLException
Keep socketTimeout=0 and let efm2 handle liveness detection
Mark read-only transactions with @Transactional(readOnly=true) to benefit from read/write splitting

Why I'm writing this

I spent a few hours chasing a performance regression that had no business existing. The service had a HikariCP pool configured for 50 connections per pod. I'd checked the Spring Boot YAML. The property names were right. The values were right. The configuration was loading at startup — I'd watched Hikari log it.

And yet, under load, the pool count plateaued at exactly 30. Not 50. Not 45. Thirty. Every time. Across every pod. Tomcat threads piled up behind a 10-second wait, connection creation time sat at 10,000 ms, and our p99 latency went vertical.

The answer, when I found it, was about two layers below where I'd been looking — inside a hardcoded lambda in a specific version of the AWS JDBC driver. I'd been tuning the wrong pool.

This post is what I wish I'd had at the start of that investigation. If you're running Spring Boot against Aurora PostgreSQL or MySQL through software.amazon.jdbc.Driver, there are a handful of things about how this driver actually works that aren't obvious from the README. Get them wrong and you get slow requests, or failed failovers, or both. Let me save you the trouble.

What the AWS Advanced JDBC Driver actually is

The docs call it a "wrapper," and that's literal — it's a thin java.sql.Driver that sits between your app and the underlying org.postgresql.Driver (or MySQL equivalent). Your URL ends up looking like this:

jdbc:aws-wrapper:postgresql://<endpoint>:5432/<db>?wrapperProfileName=F0

Everything after jdbc:aws-wrapper: is a conventional JDBC URL the wrapper passes down. What the wrapper adds is a plugin chain:

your application
  -> HikariCP (external, app-managed)
    -> aws-advanced-jdbc-wrapper
      -> [plugin 1] -> [plugin 2] -> ... -> [terminal plugin]
        -> org.postgresql.Driver
          -> Aurora instance (writer or reader)

Each plugin intercepts JDBC calls — getConnection, prepareStatement, execute — and can rewrite, retry, monitor, or split them. The plugins are why you're using this driver in the first place. They're what give you fast failover, read/write splitting, and enhanced failure monitoring. Everything else about driver configuration exists to serve the plugin chain.

Configuration profiles: convenience with teeth

The driver ships with named configuration profiles — presets that bundle a plugin list and a set of timeouts. The best-known is F0, which you turn on with wrapperProfileName=F0. F0 bundles "fast failover" — the recommended plugin set for Aurora.

Profiles are handy because they let an app team ship one URL parameter instead of a dozen properties. They're also the single biggest source of "how is this even possible?" incidents I've seen, because a profile can silently set properties you can't override from outside.

The F0 gotcha: a few hours I won't get back

Before v3.1.0, the F0 profile eagerly constructed a second, internal HikariCP pool — separate from your application's — with properties baked into a lambda at profile-load time. I didn't find this in the docs. I found it by decompiling the JAR:

// From DriverConfigurationProfiles.class in aws-advanced-jdbc-wrapper-2.6.8.jar
// (I verified this via bytecode decompilation after running out of other theories)
config.setMaximumPoolSize(30);                       // HARD CEILING
config.setConnectionTimeout(SECONDS.toMillis(10));   // 10-second wait on exhaustion

There is no property you can set to override these. The external pool config is ignored by the internal pool. The cp- property prefix (I'll get to it below) doesn't exist in v2.6.8 at all — the string "cp-" literally doesn't appear anywhere in the JAR.

Here's what was actually happening in the service at runtime:

App borrowed a logical connection from the external HikariCP (configured max = 50).
External HikariCP asked the wrapper for a physical connection.
The wrapper routed through its internal HikariCP (hardcoded max = 30).
Under load, the internal pool saturated at 30. Attempts 31–50 waited up to 10 seconds and then failed.
From my dashboards: external hikaricp.connections capped at 30, connections.pending climbed to about 170, and connections.creation.avg sat at 10,000 ms.

From the outside, this looks like a pool-sizing bug. I lost a few hours to it before the pieces clicked. The fix is a driver version bump.

v3.x: `cp-` properties and `connectionPoolType=hikari`

In v3.1.0 the driver added (PR #1658) a new URL parameter (documented under the read/write splitting plugin's internal connection pooling section):

?connectionPoolType=hikari

When that's set, the internal pool is built via HikariPooledConnectionProvider's no-arg constructor, which reads properties prefixed with cp- and forwards them to the internal Hikari config:

data-source-properties:
  cp-MaximumPoolSize: "50"
  cp-MinimumIdle: "5"
  cp-ConnectionTimeout: "30000"

The catch I hit next: cp- properties are silently ignored when wrapperProfileName=F0 is also active. The F0 preset supplies its own HikariPoolConfigurator lambda that takes precedence and still hardcodes maxPoolSize=30. F0 and cp-MaximumPoolSize cannot coexist. Pick one.

For Aurora with read/write splitting and proper pool sizing on v3.x, I dropped the profile and assembled the plugin list by hand:

spring:
  datasource:
    url: jdbc:aws-wrapper:postgresql://${database_endpoint}:5432/${db}?connectionPoolType=hikari&readerHostSelectorStrategy=roundRobin
    driver-class-name: software.amazon.jdbc.Driver
    hikari:
      connection-timeout: 60000
      maximum-pool-size: 50
      minimum-idle: 10
      data-source-properties:
        wrapperPlugins: readWriteSplitting,auroraConnectionTracker,failover,efm2
        cp-MaximumPoolSize: "50"
        cp-MinimumIdle: "5"
        cp-ConnectionTimeout: "30000"
        connectTimeout: "10000"
        loginTimeout: "10000"
        socketTimeout: "0"
        failureDetectionTime: "60000"
        failureDetectionCount: "5"
        failureDetectionInterval: "15000"
        monitoring-connectTimeout: "10000"
        monitoring-socketTimeout: "5000"
        monitoring-loginTimeout: "10000"
      exception-override-class-name: software.amazon.jdbc.util.HikariCPSQLException

This replaces what F0 was giving me (the plugin set and timeouts) while keeping cp-* effective.

When to use presets vs manual configuration

This is a gap in the official docs — there's no guidance on when presets are the right choice vs when you should go manual. Having dug through the source code and the preset codes, here's how I think about it.

The preset families:

Family	Pool type	Presets	What they're for
A / B / C	No pool	A0, A1, A2, B, C0, C1	Failover + monitoring only. No internal connection pooling. You bring your own (external) pool or don't pool at all.
D / E / F	Internal pool	D0, D1, E, F0, F1	Failover + monitoring + internal HikariCP pool (managed by the wrapper). `F0` is the most commonly referenced.
G / H / I	External pool	G0, G1, H, I0, I1	Designed for apps that manage their own pool externally. The wrapper does not create internal pools.
SF_ prefix	(matches base)	SF_D0, SF_D1, SF_E, SF_F0, SF_F1	Spring Framework variants — same as their base preset but with `readWriteSplitting` disabled (Spring handles routing via separate DataSource beans).

The number suffix indicates failure-detection sensitivity: 0 = normal, 1 = easy/less sensitive (or aggressive, depending on the family), 2 = aggressive.

The problem with pool presets (D/E/F families): every preset that creates an internal pool hardcodes the same HikariCP values in a lambda with no override mechanism:

Property	Hardcoded value	Overridable via `cp-*`?
`maxPoolSize`	30	No — preset lambda takes precedence
`connectionTimeout`	10 seconds	No
`minimumIdle`	2	No
`idleTimeout`	15 minutes	No
`keepaliveTime`	3 minutes	No
`validationTimeout`	1 second	No
`maxLifetime`	1 day	No
`initializationFailTimeout`	-1	No

This applies to D0, D1, E, F0, F1 and their SF_ variants — all of them hardcode maxPoolSize=30. The cp-* properties (like cp-MaximumPoolSize) are silently ignored when any of these presets are active, because the preset's HikariPoolConfigurator lambda overrides the HikariPooledConnectionProvider's property-reading path.

When to use a preset:

You're prototyping, running a small service, or don't have specific pool-sizing requirements.
maxPoolSize=30 and connectionTimeout=10s are acceptable for your workload.
You want a known-good plugin + timeout combination without thinking about individual settings.
You're using a no-pool preset (A/B/C family) and bringing your own external pool — these have no hardcoded pool values to collide with.

When to go manual (drop the preset):

You need to control maxPoolSize, connectionTimeout, or any other pool property — which is most production deployments. This is what I had to do.
You're running at non-trivial throughput where 30 connections per internal pool is a ceiling (this was my exact situation).
You want cp-* properties to actually take effect.
You're combining readWriteSplitting with @Transactional(readOnly=true) in Spring and need internal pools with custom sizing.

The manual approach means specifying connectionPoolType=hikari + wrapperPlugins=... + cp-* properties explicitly, instead of wrapperProfileName=F0. You lose the convenience of a single preset name, but you gain control over every property. For reference, the Configuration Presets docs list what each preset bundles, so you can replicate the plugin list and timeouts manually while overriding only the pool properties you need.

External pooling vs internal pooling — what each layer is actually doing

This is something most folks will need to pay attention to. These two layers are not redundant. They do different jobs.

External pool (my application's HikariCP, managed by Spring Boot)

Scope: one pool per Spring DataSource bean, typically one per pod.
Holds: logical connections — the java.sql.Connection objects my code calls .prepareStatement on.
Gates: how many threads can hold a connection concurrently. If this is 50, request #51 waits or times out.
Maps to: how many Tomcat threads can simultaneously sit inside a DB-touching request.

Internal pool (managed by the wrapper, one per Aurora instance)

Scope: with readWriteSplitting + connectionPoolType=hikari, one internal pool per Aurora instance — a writer pool, and one pool per reader. The wrapper routes logical connections to the right instance based on read-only hints (setReadOnly(true) or @Transactional(readOnly=true) in Spring).
Holds: physical connections — TCP/TLS sessions to a specific Aurora node.
Gates: how many physical sockets stay open to each instance.
Maps to: Aurora's per-instance max_connections. The default formula is LEAST({DBInstanceClassMemory/9531392}, 5000), so memory-rich instances like db.r7i.4xlarge (128 GiB) hit the 5,000 hard cap rather than scale further.

Why both are needed — and the official caveat

The external pool's logical connections are cheap — Java objects wrapping references into the internal pool. The internal pool's physical connections are expensive — TLS handshake, auth, wire protocol. The wrapper hands out a single logical connection from the external pool while keeping the physical session pinned to the correct instance (writer for writes, reader-N for reads).

Without the internal pool layer, every getConnection() from the external pool would open a fresh physical connection to some instance. That undoes HikariCP's entire point.

Important caveat from the AWS docs: the ReadWriteSplitting plugin documentation explicitly states:

"Using internal and external pools at the same time has not been tested and may result in problematic behaviour."

The docs go further and recommend disabling external connection pools entirely when using internal pooling:

"If you want to use the driver's internal connection pooling, we recommend that you explicitly disable external connection pools (provided by Spring). You need to check the spring.datasource.type property to ensure that any external connection pooling is disabled."

Here's the thing that's easy to miss: if your Spring Boot app has spring.datasource.hikari.* properties and connectionPoolType=hikari in the JDBC URL, you're running double pools whether you intended to or not. connectionPoolType=hikari only controls the wrapper's internal pool — it doesn't replace or disable the external one. Spring Boot independently auto-detects HikariCP on the classpath and creates the external HikariDataSource bean. Unless you explicitly set spring.datasource.type=org.springframework.jdbc.datasource.SimpleDriverDataSource, both pools are active. This is almost certainly the configuration most Spring Boot teams end up with.

In practice, I've run both pools together under sustained load without issues — but that's my workload, not a guarantee. The double-pool architecture works when you treat the external pool as a concurrency gate and the internal pools as physical-session caches, and keep cp-MaximumPoolSize >= maximumPoolSize so the internal layer never becomes the bottleneck. But if you're hitting edge cases — connections leaking, intermittent stale-connection errors after failover, or pool metrics that don't add up — this officially-untested interaction is the first thing to suspect.

So how do you actually disable the external pool?

This is the part I want to make crystal clear, because it's easy to think you've solved double-pooling when you haven't.

Why you're probably running double pools right now: Spring Boot auto-detects HikariCP on your classpath (it's pulled in by spring-boot-starter-data-jpa or spring-boot-starter-jdbc) and creates a HikariDataSource bean automatically. Setting connectionPoolType=hikari in the wrapper URL does not turn this off — that only tells the wrapper to create its own internal pools. These are two independent systems that don't know about each other.

If your application.yaml looks like this, you have two pools:

# THIS IS DOUBLE-POOLING — both pools are active
spring:
  datasource:
    url: jdbc:aws-wrapper:postgresql://...?connectionPoolType=hikari&readerHostSelectorStrategy=roundRobin
    driver-class-name: software.amazon.jdbc.Driver
    hikari:                          # ← Spring Boot sees this and creates external HikariCP
      maximum-pool-size: 50
      minimum-idle: 10
      data-source-properties:
        cp-MaximumPoolSize: "50"     # ← wrapper sees this and creates internal HikariCP
        cp-MinimumIdle: "5"

To run single-pool (internal only), set spring.datasource.type to a non-pooling DataSource implementation. This tells Spring Boot to skip HikariCP auto-detection. The catch: without the hikari: section, there's no data-source-properties: block to put your cp-* and wrapper properties in. You have two options.

Option A — pass everything as URL parameters. Reliable but the URL gets long:

# SINGLE-POOL (internal only) — cp-* and plugin config in the URL
spring:
  datasource:
    type: org.springframework.jdbc.datasource.SimpleDriverDataSource   # ← disables external HikariCP
    url: >-
      jdbc:aws-wrapper:postgresql://${database_endpoint}:5432/${database_name}
      ?connectionPoolType=hikari
      &readerHostSelectorStrategy=roundRobin
      &wrapperPlugins=readWriteSplitting,auroraConnectionTracker,failover,efm2
      &cp-MaximumPoolSize=50
      &cp-MinimumIdle=5
      &cp-ConnectionTimeout=30000
      &connectTimeout=10000
      &loginTimeout=10000
      &socketTimeout=0
      &failureDetectionTime=60000
      &failureDetectionCount=5
      &failureDetectionInterval=15000
      &monitoring-connectTimeout=10000
      &monitoring-socketTimeout=5000
      &monitoring-loginTimeout=10000
    driver-class-name: software.amazon.jdbc.Driver
    # No hikari: section — Spring won't create an external pool

Option B — use the wrapper's own DataSource class. The wrapper provides AwsWrapperDataSource which accepts properties directly, keeping the YAML clean:

# SINGLE-POOL (internal only) — using AwsWrapperDataSource
spring:
  datasource:
    type: software.amazon.jdbc.ds.AwsWrapperDataSource
    url: jdbc:postgresql://${database_endpoint}:5432/${database_name}   # ← note: no aws-wrapper: prefix
    driver-class-name: org.postgresql.Driver                            # ← the underlying driver, not the wrapper
    connection-properties:
      wrapperPlugins: readWriteSplitting,auroraConnectionTracker,failover,efm2
      connectionPoolType: hikari
      readerHostSelectorStrategy: roundRobin
      cp-MaximumPoolSize: "50"
      cp-MinimumIdle: "5"
      cp-ConnectionTimeout: "30000"
      connectTimeout: "10000"
      loginTimeout: "10000"
      socketTimeout: "0"
      failureDetectionTime: "60000"
      failureDetectionCount: "5"
      failureDetectionInterval: "15000"
      monitoring-connectTimeout: "10000"
      monitoring-socketTimeout: "5000"
      monitoring-loginTimeout: "10000"

Note the differences with AwsWrapperDataSource: the URL drops the jdbc:aws-wrapper: prefix (it's a plain jdbc:postgresql: URL since the wrapper IS the DataSource), and driver-class-name points to the underlying driver, not the wrapper. See the DataSource configuration docs for details.

To run single-pool (external only), remove connectionPoolType=hikari from the URL. The wrapper won't create internal pools, and every getConnection() from the external HikariCP opens a physical connection through the wrapper on-demand:

# SINGLE-POOL — only the external HikariCP is active
spring:
  datasource:
    url: jdbc:aws-wrapper:postgresql://...?readerHostSelectorStrategy=roundRobin
    driver-class-name: software.amazon.jdbc.Driver
    hikari:
      maximum-pool-size: 50
      minimum-idle: 10
      # No cp-* properties needed — no internal pool exists

Trade-offs at a glance

Configuration	External pool	Internal pool	What you get	What you lose
Double pool (most Spring Boot apps)	Spring HikariCP (`hikari:` section)	Wrapper HikariCP (`connectionPoolType=hikari` + `cp-*`)	Full Spring metrics, health checks, familiar config surface. Physical connections cached per Aurora instance.	Running an officially-untested combination. Two pools to reason about. Higher DB connection count than expected.
Internal only via `SimpleDriverDataSource` (`spring.datasource.type=SimpleDriverDataSource`)	Disabled	Wrapper HikariCP	The configuration AWS actually tests against. Clean single-pool model.	No `hikaricp.` Micrometer metrics from Spring. No HikariCP health indicator in `/actuator/health`. `cp-` properties must go in the URL — gets unwieldy with many parameters.
Internal only via `AwsWrapperDataSource` (`spring.datasource.type=software.amazon.jdbc.ds.AwsWrapperDataSource`)	Disabled	Wrapper HikariCP	AWS-tested single-pool model. Clean YAML via `connection-properties` block — no URL stuffing.	Same observability trade-offs as `SimpleDriverDataSource` (no Spring Hikari metrics/health). Different URL format (`jdbc:postgresql:` not `jdbc:aws-wrapper:postgresql:`) and `driver-class-name` points to the underlying driver. See DataSource docs.
External only (no `connectionPoolType` in URL)	Spring HikariCP	None	Familiar Spring config. Full metrics.	No per-instance physical connection caching. `@Transactional(readOnly=true)` with `readWriteSplitting` triggers a full connection switch per call (see Spring Boot limitation below).

Where I am with this

I've been experimenting with the double-pool setup and so far it's been working without problems under sustained load across multiple pods. The external pool gives you the Micrometer metrics that make diagnosing issues possible — the hikaricp.connections.pending signal is how I caught the F0 ceiling issue — and the internal pool gives you efficient physical-connection reuse across reader/writer instances. The key invariant is cp-MaximumPoolSize >= maximumPoolSize so the internal layer never becomes the bottleneck.

The one tangible downside I've observed: you use more database connections than you'd expect. The external pool holds logical connections while the internal pools independently hold physical connections per Aurora instance. In practice the connection count on Aurora ends up higher than what the external pool size alone would suggest, because the internal pools maintain their own minimum-idle and maximum-size independently. For a fleet of pods, this adds up — make sure your Aurora instance's max_connections has headroom for pods × cp-MaximumPoolSize × (1 + number_of_readers), not just pods × maximumPoolSize.

If you're hitting unexplained edge cases — connections leaking, intermittent stale-connection errors after failover, or pool metrics that don't add up — the officially-untested double-pool interaction is the first thing to suspect. Switching to internal-only (spring.datasource.type=SimpleDriverDataSource or AwsWrapperDataSource) is the cleanest way to eliminate it as a variable.

It's also worth noting that you can use the wrapper without HikariCP entirely — the internal pool with connectionPoolType=hikari is a self-contained HikariCP instance managed by the wrapper. If you're building a non-Spring app or a lightweight service, running only the internal pool is the cleaner architecture and avoids the double-pool question altogether.

F0 vs SF_F0: should Spring Boot apps use `readWriteSplitting`?

This is one of the more confusing areas in the docs, and it matters because it determines your entire read/write routing architecture.

From the source code:

Preset	Plugins	Internal pool
F0	`auroraInitialConnectionStrategy`, `auroraConnectionTracker`, `readWriteSplitting`, `failover`, `efm2`	Yes (maxPoolSize=30)
SF_F0	`auroraInitialConnectionStrategy`, `auroraConnectionTracker`, `failover`, `efm2`	Yes (maxPoolSize=30)

The only difference: SF_F0 drops readWriteSplitting. Both have the same internal pool. The SF_ prefix stands for "Spring Framework" — these variants are meant for Spring apps.

Why does the Spring variant disable read/write splitting?

The Spring Boot limitations section of the ReadWriteSplitting plugin docs explains:

The use of read/write splitting with the annotation @Transactional(readOnly=true) is **only* recommended for configurations using an internal connection pool.*

When Spring encounters @Transactional(readOnly=true), it calls conn.setReadOnly(true) before the method and conn.setReadOnly(false) after. The readWriteSplitting plugin responds by switching from writer→reader→writer on every annotated method call. Without an internal pool, each switch is a full TCP/TLS reconnect — the docs call this "substantial performance degradation." The SF_ presets sidestep this by disabling the plugin entirely and recommending two separate Spring DataSource beans instead (one for the writer cluster endpoint, one for the reader endpoint), letting Spring handle routing.

The contradiction: SF_F0 has internal pools — exactly the prerequisite the docs say makes readWriteSplitting safe. With internal pools, the setReadOnly toggle reuses cached physical connections from the per-instance pools (writer pool, reader pool), making the switch a cheap object swap rather than a TCP reconnect. So SF_F0 disables a plugin that should work fine with the internal pools it already provides.

My read: the SF_ presets were likely created before connectionPoolType=hikari made the internal-pool + readWriteSplitting combination clean and testable. The docs haven't fully reconciled this — they warn about the overhead, correctly note that internal pools mitigate it, but then the SF_ presets still disable it out of caution.

Three paths for Spring Boot read/write splitting:

Approach	readWriteSplitting plugin	How reads route to readers	Trade-off
Plugin with internal pools (what we use)	Enabled	`@Transactional(readOnly=true)` triggers `setReadOnly(true)` → plugin routes to reader via cached internal pool	Single DataSource bean. Clean. Requires internal pools for acceptable switching overhead.
Two DataSource beans (what SF_ presets assume)	Disabled	Spring's `AbstractRoutingDataSource` or `@Qualifier` annotations route to a writer or reader DataSource at the service layer	No plugin overhead. More application-level wiring. Each DataSource can independently use the wrapper for failover/monitoring.
Plugin without internal pools (don't do this)	Enabled	`setReadOnly` triggers a full physical connection switch per call	Substantial overhead. The docs explicitly warn against this.

If you're already on manual config with connectionPoolType=hikari and cp-* properties (which you need anyway for pool sizing), enabling readWriteSplitting works — the internal pools handle the switching cost. If you prefer the two-DataSource approach, use a no-readWriteSplitting configuration (like SF_F0's plugin list, but with manual pool sizing since the preset hardcodes maxPoolSize=30).

Either way, don't mix the two: having readWriteSplitting enabled while also routing via separate DataSources would result in double routing logic that's hard to reason about.

HikariCP and virtual threads: a known compatibility issue

If you're running on JDK 21+ and considering Spring Boot's spring.threads.virtual.enabled=true, there is an open HikariCP bug (#2398) to be aware of. The issue is filed against HikariCP 7.0.2: the ConcurrentBag.requite() method uses a yield-spin loop (Thread.yield() 255 times for every parkNanos) that saturates all carrier threads under virtual-thread load. The result is CPU throttling at the pod level and potential liveness-probe failures — the exact kind of silent performance regression that's hard to diagnose without knowing about this issue.

As of this writing, the proposed fix in PR #2399 has not been merged. Spring Boot 3.5.7's BOM pins HikariCP 6.3.3 by default rather than 7.x, and the bug report doesn't reproduce against the 6.x line — so check your effective HikariCP version before assuming you're affected. The workaround if you are is to disable virtual threads (-Dspring.threads.virtual.enabled=false). If you're running the AWS JDBC wrapper with HikariCP as your external pool and enabling virtual threads on a 7.x version, this is the interaction to watch — it's not a wrapper bug, but it surfaces at the same layer (connection pool) and looks similar in dashboards to the internal-pool ceiling problem I described earlier.

Sizing rule

For P pods, external pool size E, and R readers in the Aurora cluster, the physical connection footprint is:

Writer instance:   up to P * cp-MaximumPoolSize physical connections
Per reader:        up to P * cp-MaximumPoolSize physical connections
Total:             P * cp-MaximumPoolSize * (1 + R)

If cp-MaximumPoolSize is the bottleneck, logical getConnection() calls sit in the internal pool's wait queue — which is exactly the v2.6.8 failure mode, just on a newer version where you technically can fix it. The invariant to hold: cp-MaximumPoolSize >= external pool size so the internal layer never becomes the bottleneck. Going higher is fine as long as the total stays under Aurora's max_connections per instance with ~20% headroom.

Life of a single SELECT

When I was first onboarding someone to this, the thing that actually landed was walking through one request end-to-end:

Tomcat thread calls userRepository.findById(42).
Spring Data borrows a logical connection from external HikariCP (external pool count goes up by 1).
Transaction manager begins a tx. Say it's @Transactional(readOnly=true) — the read-only hint is set on the logical connection.
First real statement flows through the plugin chain. readWriteSplitting sees the read-only flag, picks reader-1 (round-robin), and routes to reader-1's internal pool.
Reader-1's internal pool hands over a physical session; the wrapper binds it to the logical connection for the rest of the tx.
Query executes on reader-1.
Tx commits. Physical session returns to reader-1's internal pool; logical connection returns to external Hikari.

The plugin catalog, and when I use which

Plugins are a comma-separated list on wrapperPlugins. Order matters. The driver applies them outside-in.

What I always run for Aurora

failover — detects Aurora writer/reader failover events via topology awareness, invalidates broken connections, reroutes to the current writer. Without this, a writer failover leaves the driver holding a dead TCP session until OS-level timeouts fire (minutes). (There's also a newer failover2 plugin worth evaluating for new deployments.)
auroraConnectionTracker — maintains the map of live connections per instance. failover needs it to know which connections to invalidate.
efm2 — Enhanced Failure Monitor v2. A background thread per connection probes the socket at failureDetectionInterval; if failureDetectionCount consecutive probes fail within failureDetectionTime, the connection is marked bad and failover kicks in. v2 is current; v1 / efm is deprecated and should not be used in new configs.

What I enable conditionally

readWriteSplitting — routes read-only transactions to readers, writes to the writer. Enable when you have one or more readers and your code marks read transactions properly (@Transactional(readOnly=true)). Without the hint, the plugin sends everything to the writer and you get no benefit. I've seen more than one team enable it and then wonder why their readers sit idle.
iamAuth — IAM-based auth instead of password. Enable if you're doing IAM to Aurora; otherwise skip.
awsSecretsManager — pulls creds from Secrets Manager at connection time. Overlaps with external secret-rotation workflows; I enable only if I'm not rotating through Kubernetes secrets.
federatedAuth / okta — SSO-style auth; niche in my experience.
dev / logQueryPlansWhenNeeded — debugging only, never prod.

My default stack for Aurora PG + HikariCP

wrapperPlugins: readWriteSplitting,auroraConnectionTracker,failover,efm2

I put readWriteSplitting first so routing happens before failover/topology logic — that way failover can reroute a connection to the "current" writer regardless of who it was bound to. efm2 is last because it's terminal: it wraps the underlying connection with monitoring.

Aurora with multiple readers: the configuration I'm shipping

This is what I'm running now against a 1 writer + 2 reader Aurora cluster. It's not the only sensible config, but I've run it in anger through a few load tests and it's the one I trust.

url: jdbc:aws-wrapper:postgresql://${endpoint}:5432/${db}?connectionPoolType=hikari&readerHostSelectorStrategy=roundRobin
hikari:
  connection-timeout: 60000
  maximum-pool-size: 50
  minimum-idle: 10
  data-source-properties:
    wrapperPlugins: readWriteSplitting,auroraConnectionTracker,failover,efm2
    cp-MaximumPoolSize: "50"
    cp-MinimumIdle: "5"
    cp-ConnectionTimeout: "30000"
    # I let efm2 handle liveness. TCP timeout is intentionally 0.
    connectTimeout: "10000"
    loginTimeout: "10000"
    socketTimeout: "0"
    # efm2 tuning — see "failover budget" below
    failureDetectionTime: "60000"        # grace period before monitoring starts
    failureDetectionInterval: "15000"    # 15s between probes
    failureDetectionCount: "5"           # 5 failed probes = dead
    monitoring-connectTimeout: "10000"
    monitoring-socketTimeout: "5000"
    monitoring-loginTimeout: "10000"
  exception-override-class-name: software.amazon.jdbc.util.HikariCPSQLException

Reader host selection

readerHostSelectorStrategy controls how readWriteSplitting picks a reader:

roundRobin — distributes reads evenly. My default.
random — statistically even but variable in any given second.
leastConnections — picks the reader with the fewest active physical connections. Worth it when readers have meaningfully different workloads, but adds a small lookup cost per acquisition.
fastestResponse — picks the reader with the lowest observed response latency. Useful when readers have asymmetric hardware or load.

For a homogeneous reader fleet, roundRobin is the cleanest and cheapest. I've only ever needed leastConnections once, for an asymmetric deployment.

The exception-translation line I almost missed

exception-override-class-name: software.amazon.jdbc.util.HikariCPSQLException is easy to skip over (see the Spring Boot + HikariCP example where it's buried at the bottom of the YAML). Without it, HikariCP sees failover-triggered SQLExceptions as "normal" and tries to hand out connections the wrapper has already invalidated. Pool stays confused, latency stays bad, and the ordinary failover recovery path never fully completes. Not optional if you're on HikariCP + failover. Set it once and never think about it again.

Performance aspects

Where time actually goes

Under steady load, the wrapper's overhead breaks down into three categories:

Plugin chain traversal — every JDBC call walks through the chain. For N plugins and M statements per transaction, you pay N×M method-dispatch overhead. On v3.x it's low single-digit microseconds — not zero, but invisible unless you're chasing the last 1% of p99. The rule I follow: don't enable plugins you aren't using.
Physical connection creation — TLS handshake + auth + wire setup. One-time per internal pool slot; amortized, it's invisible unless the pool is cold or under-sized and the driver is creating sessions continuously.
Monitoring traffic — efm2 sends lightweight probes per connection. At failureDetectionInterval=15000 the volume is tiny.

Metrics I always watch

Metric	What it tells me
`hikaricp.connections` (total)	External pool size. Should grow to `maximumPoolSize` under load. If it plateaus below the configured max, I'm hitting the internal pool ceiling — that's exactly how I finally caught the v2.6.8 F0 issue.
`hikaricp.connections.active`	Currently in-use logical connections. Near the max = contention.
`hikaricp.connections.pending`	Threads waiting to borrow. Steady-state non-zero = bottleneck. I alert on this.
`hikaricp.connections.creation` (ms)	Time to acquire a physical connection through the wrapper. Single-digit ms is normal; 10,000 ms means an internal-pool wait timed out. This is the specific signal that said "the problem isn't the external pool."
`hikaricp.connections.timeout`	Borrow timeouts. Always zero when healthy.
Aurora `DatabaseConnections`	Physical conns per instance. Should roughly equal `sum over pods of (active internal-pool conns to this role)`. Cross-reference with `cp-MaximumPoolSize`.
Aurora `Deadlocks`, `CommitLatency`	Independent of the driver but often regress together if pool sizing forces serialization at the app layer.

My sizing calculator

For P pods, E external pool size, R_n reader count, target Aurora M max_connections per instance with 20% headroom:

cp-MaximumPoolSize = E                                     # invariant; no internal-pool wait
Writer physical at peak      = P * cp-MaximumPoolSize
Per-reader physical at peak  = P * cp-MaximumPoolSize       (round-robin balances across readers)
Sanity: P * cp-MaximumPoolSize <= 0.8 * M

Plug in your own numbers: P * cp-MaximumPoolSize per role. Check this against the max_connections for your Aurora instance class and leave ~20% headroom for maintenance connections and other clients.

Failover — what happens under the hood

Aurora failover — writer restart, reader promotion, or AZ failover — is the specific scenario the wrapper's plugins were built to survive. The first time I watched a failover in production with this stack, I actually wanted to know what was happening step by step. Here's what I worked out.

Sequence during a writer failover

Writer instance goes unresponsive. TCP sockets from my pods to that writer stop returning packets.
efm2's monitor thread hits failureDetectionCount consecutive probe failures within failureDetectionTime. The underlying connection is marked bad.
My app's next statement on that connection throws a SQLException tagged with a failover-relevant SQLState.
failover catches it, queries Aurora topology (via the RDS DNS or the cluster's topology endpoint), identifies the new writer, and reconnects transparently.
If configured (failoverMode=reader-or-writer), the reconnect can fall back to a reader for the brief window where no writer is available. Default is writer.
auroraConnectionTracker walks its table of open connections to the dead instance and invalidates them.
External HikariCP sees the invalidation through HikariCPSQLException (this is the moment exception-override-class-name matters) and evicts the bad logical connections.
New logical connections open against fresh internal-pool slots bound to the new writer.

End-to-end with default timers: detection ~75 seconds (failureDetectionTime=60000 + up to failureDetectionCount=5 × failureDetectionInterval=15000), reconnect ~5-15 seconds (Aurora DNS propagation + fresh handshake). My app's p99 takes a visible bump during that window; business recovers within ~90 seconds.

Tuning the detection budget

Aggressive (~15-30 s to detect): failureDetectionTime=15000, failureDetectionInterval=5000, failureDetectionCount=3. More probe traffic; more false positives on transient network blips.
Default (~75 s, what's in the YAML above): what I run by default. Good for most apps.
Lax (~3+ min): raise failureDetectionTime past 120000. Only use this if you have independent health-signal paths and don't want efm2 to chatter.

One thing I stopped doing: don't set socketTimeout small on the main connection (socketTimeout=5000 and friends) hoping to catch failures faster. That fires on every slow query — including legitimate long-running reports — and turns every transient spike into connection churn. Let efm2 own liveness detection. Keep socketTimeout=0. I learned this the hard way after a 12-minute query triggered a pool-wide connection churn event.

Resilience patterns worth knowing

`failoverMode`

Controls fallback when no writer is reachable:

strict-writer — only reconnect to a writer. Default when connecting via the cluster writer endpoint. During a prolonged failover, connections stall until a new writer is up.
reader-or-writer — fall back to a reader for reads if no writer is available. Default when connecting via the read-only cluster endpoint. Useful for read-heavy apps that can tolerate writes being rejected; writes still fail until the writer is back.
strict-reader — never connect to the writer. Dedicated read-replica deployments only.

My default is strict-writer (which matches the implicit default for cluster-writer-endpoint connections). I've only ever overridden it for a reporting workload where read availability mattered more than write availability.

Connection churn during failover (don't panic)

The immediate aftermath of a failover event looks rough on dashboards: connections.creation spikes to seconds (new TLS handshakes), connections.timeout briefly non-zero, p99 climbs. All expected. The key is the spike ends, typically within ~30 seconds of the new writer being healthy. If you see a sustained elevated connections.creation after the event, check whether exception-override-class-name is configured — without it, HikariCP keeps handing out invalidated connections and the churn doesn't stop on its own.

Read-only traffic during failover

Readers are unaffected by writer failover. readWriteSplitting + correctly-marked read-only transactions means read traffic keeps flowing while writes pause for ~30-60 seconds. For read-heavy apps, marking transactions readOnly=true turns out to be both a performance win and an availability one. Do it for both reasons.

Blue/green deployments

If you're doing Aurora blue/green (RDS Blue/Green), the switchover is a writer-failover-like event from the driver's perspective. The plugins cover it with no extra config, but the same detection-budget trade-offs apply: faster detection = faster cutover = more false-positive risk during normal ops.

RDS Proxy: when, and how it interacts with this driver

If you've read this far, you're either using or considering RDS Proxy. The two layers — RDS Proxy in front of Aurora, the AWS JDBC driver inside your app — solve overlapping but not identical problems, and the AWS guidance you'd want to read together is scattered across the proxy planning page, the wrapper README, and a plugin doc most people miss.

When AWS recommends RDS Proxy

The planning page lists the canonical cases: "too many connections" pressure, T2/T3 instances where connection-setup CPU is significant, Lambda / serverless workloads, apps without a built-in pool, centralized IAM auth or Secrets Manager rotation, failover speedup (advertised at "up to 66%", typically <35 s for Multi-AZ Aurora), and Blue/Green deployments. For a long-lived Spring Boot pod with a well-tuned HikariCP, only the last three are particularly compelling — the multiplexing benefit is mostly theoretical when your external pool is sized correctly.

How RDS Proxy actually routes

The thing that catches teams out is the assumption that the proxy "splits reads and writes intelligently." It doesn't. From the endpoints docs, the proxy exposes two endpoints — a read/write endpoint that sends every request to the current writer, and a read-only endpoint that sends every request to some reader (with proxy-level rebalancing if a reader fails). There's no SQL inspection. The proxy routes where you point it, not what you send through it. SQL-aware splitting still requires application-side logic — either two DataSource beans in your app or the srw plugin described below.

Plugin compatibility behind RDS Proxy

The wrapper README's RDS Proxy section is unambiguous:

"Functionality like Failover, Enhanced Host Monitoring, and Read/Write Splitting is not compatible since the driver relies on cluster topology and RDS Proxy handles this automatically. The driver remains useful with RDS Proxy for authentication workflows, such as IAM authentication and AWS Secrets Manager integration."

Translated:

Plugin	Behind RDS Proxy
`failover`, `failover2`	Drop. Proxy handles writer failover; topology lookups conflict with the hidden pool.
`efm2`	Drop. Per-connection probes don't see the underlying Aurora node.
`readWriteSplitting`	Drop. Relies on topology that's invisible behind the proxy.
`iamAuth`	Keep if you want JDBC-layer IAM (alternative to configuring it on the proxy).
`awsSecretsManager`	Optional — overlaps with proxy auth. Usually skip.
`srw` (Simple R/W Splitting)	Keep — purpose-built for this combination.

The `srw` plugin — SQL-aware splitting through RDS Proxy

Available since v3.0.0 and documented here. Unlike readWriteSplitting, srw doesn't query the cluster for topology. You give it two explicit endpoints — srwWriteEndpoint (your read/write proxy endpoint) and srwReadEndpoint (your read-only proxy endpoint) — and it switches between them on Connection#setReadOnly(true/false). With Spring's @Transactional(readOnly=true), you keep the same single-DataSource ergonomics you'd have with readWriteSplitting against direct Aurora.

Two gotchas. Role verification (verifyNewSrwConnections=true by default) runs SELECT pg_catalog.pg_is_in_recovery() after switching, with up to a 60-second retry budget, to defend against DNS-cache staleness right after failover. Useful on paper; it conflicts with autocommit=false because the verification query opens a transaction. Either set setReadOnly before disabling autocommit, or set verifyNewSrwConnections=false. Mutual exclusion: don't combine srw with readWriteSplitting or gdbReadWriteSplitting on the same connection. They're alternatives, not layers.

Decision tree

Setup	Plugins	Read/write split mechanism
Direct to Aurora, no proxy	`readWriteSplitting`, `auroraConnectionTracker`, `failover`, `efm2` (+ `cp-*`)	Wrapper plugin, one DataSource, `@Transactional(readOnly=true)` routes via topology.
RDS Proxy + wrapper, SQL-aware split	`srw` (+ `iamAuth` if needed)	`srw` switches between two proxy endpoints on `setReadOnly`. One DataSource.
RDS Proxy + plain `org.postgresql.Driver`	n/a	Two DataSource beans (one per proxy endpoint). App routes manually.
Lambda / serverless	n/a	RDS Proxy + plain driver. The wrapper's value is amortized warm-pool benefits — irrelevant for cold invocations.

Pinning — the multiplexing trap

RDS Proxy multiplexes by handing one backend session to multiple client connections, but only when the session is resettable. The pinning rules for Aurora PostgreSQL disable multiplexing on SET, PREPARE/DEALLOCATE/EXECUTE, temporary tables, declared cursors, LISTEN, advisory locks, and any statement >16 KB. Hibernate with server-side prepared statements pins on every session. There are real teams (Aggarwal's 12-hour revert is the most-cited public postmortem) that hit ~100% pinning under load and pulled the proxy out the same day. The diagnostic is the DatabaseConnections.PinnedConnections CloudWatch metric — if pinned connections approach total, you're paying for a proxy that isn't actually multiplexing.

My take

RDS Proxy and the AWS JDBC driver aren't usually a "pick one" decision — they solve different concerns and can layer cleanly if you pick the right plugins. Three rules I'd hold:

Failover ownership belongs to one layer. Don't run failover + efm2 behind a proxy. The proxy already does it; you're paying twice and risking conflicting reactions to transient errors.
Read/write splitting needs an explicit choice. Two DataSource beans, or srw, or readWriteSplitting (no proxy). Pick one — never two.
The wrapper still earns its keep behind a proxy if you're using IAM auth or srw. Otherwise plain org.postgresql.Driver is simpler and the wrapper's plugin chain is mostly cosmetic.

If your motivation for either layer is "make the app faster," neither is the answer — that's a query / index / cache problem.

The checklist I run through before shipping

Before I put the wrapper in front of production traffic, I go through this list. Nothing on it is optional.

Driver version ≥ 3.3.0. cp-* properties landed in v3.1.0 and efm2 has been available since v2.4.0, so this is not the floor for those features. The reason I draw the line at 3.3.0: it includes the readWriteSplitting + failover plugin-ordering fix and removes a 5-second sleep from the failover recovery path. If you're below 3.1.0, cp-* won't work at all.
F0 profile not in use unless version-aware — on v2.x, F0 hardcodes maxPoolSize=30. I've been burned.
cp-MaximumPoolSize ≥ maximumPoolSize on the external pool.
exception-override-class-name set to software.amazon.jdbc.util.HikariCPSQLException.
socketTimeout=0 — liveness belongs to efm2.
Read-only transactions annotated — otherwise readWriteSplitting is decorative.
Aurora max_connections supports pods × cp-max × (1 + readers) with 20% headroom.
Topology endpoint reachable from every pod (cluster and per-instance DNS resolve via VPC DNS).
Plugin list ordered: readWriteSplitting,auroraConnectionTracker,failover,efm2.
Observability wired — hikaricp.connections.pending alert on non-zero steady state.

Where this leaves me

The AWS JDBC Driver is one of those libraries where the defaults are opinionated but not obvious, the configuration surface is large, and the version-to-version behavior has shifted in ways that invalidate older docs you'll find on the internet. The cases where I've seen teams get into trouble all look the same: they adopted a profile without reading what was inside it, or they moved from v2.x to v3.x without re-checking whether the properties they'd set still did anything.

If I could boil this post down to one practical habit: don't trust the external pool metrics alone. The wrapper adds a whole second layer of pooling between your hikaricp.connections count and the actual network. When the external pool metrics look fine but your requests are slow, look inside. And if you're still on v2.x with F0, upgrade — there is no property you can set to make it behave.

I lost a few hours to this. You shouldn't have to lose any.

References

AWS Advanced JDBC Wrapper — driver docs

Using the JDBC Driver — full parameter reference including wrapperPlugins, wrapperProfileName, wrapperDialect, and all connection properties
Configuration Presets — what F0, F1, SF0, etc. actually configure (plugins, pool settings, timeouts)
Host Selection Strategies — roundRobin, random, leastConnections, highestWeight
Failover Configuration Guide — failoverMode, detection tuning, transactional behavior during failover
Framework Integration — notes on Spring Boot, Hibernate, and other framework specifics
DataSource Configuration — alternative to driver-mode configuration via AwsWrapperDataSource
Compatibility — supported databases, JDBC versions, known limitations

AWS Advanced JDBC Wrapper — plugin docs

readWriteSplitting — reader routing, internal connection pooling with connectionPoolType=hikari, cp-* properties, readerHostSelectorStrategy
failover — classic failover plugin; topology detection, connection invalidation
failover2 — newer failover implementation (v2); recommended for new deployments
efm2 (Host Monitoring) — failureDetectionTime, failureDetectionInterval, failureDetectionCount, monitoring timeouts
auroraConnectionTracker — connection-to-instance mapping for failover invalidation

AWS Advanced JDBC Wrapper — examples and changelog

Spring Boot + HikariCP example — working YAML with exception-override-class-name and HikariCP data-source properties
Spring + Hibernate example — Hibernate-specific session factory integration
Spring Transaction Failover example — handling transactional rollback during failover
PR #1658 — configurable internal pool — the change (v3.1.0) that made cp-* properties work outside of profiles
Changelog — version-to-version migration notes

RDS Proxy

Planning where to use Amazon RDS Proxy — the canonical use-case list (Lambda, T2/T3, IAM auth, Blue/Green, failover speedup)
RDS Proxy endpoints — read/write vs read-only endpoints; "the proxy routes where you point it" semantics
Avoiding pinning — full list of session-state operations that disable multiplexing per engine
Wrapper README — RDS Proxy section — the official statement that failover, efm2, and readWriteSplitting are incompatible with RDS Proxy
Simple Read/Write Splitting Plugin (srw) — the topology-agnostic plugin purpose-built for use behind RDS Proxy (since v3.0.0)
RDS Proxy pricing — per vCPU-hour for provisioned, per ACU-hour for Serverless

External

HikariCP — About Pool Sizing
Aurora PostgreSQL — Performance and scaling for Amazon Aurora PostgreSQL (source for the max_connections default formula and the 5,000-connection cap)
Aggarwal — "Experience with AWS RDS Proxy in production, and why we had to revert it in 12 hours" (cited in the pinning section)

Hosting a Static Website on Amazon S3

Esther Ninyo — Wed, 29 Apr 2026 18:51:42 +0000

Deploying a static website just got easier with S3 on AWS. You don't have to manage servers, and Amazon S3 is one of the easiest and most cost-effective ways to host a static website.

In this project, I will walk you through how to deploy your project on S3 step-by-step. This is a beginner-friendly project.

I will be making use of an HTML website I cloned in 2020 when I just started learning how to code. Feel free to use any HTML project of your choice. The steps are the same.
If you don't have an HTML project, you can create a simple index.html file and paste the code below into the file:

<!DOCTYPE html>
<html>
<head>
    <title>My AWS Website</title>
</head>
<body>
    <h1>Hello from AWS S3!</h1>
</body>
</html>

What we will do:

Create an S3 bucket
Upload project files
Enable static website hosting
Make the website publicly available

Resources to be created:

AWS S3 bucket

Prerequisite:

AWS account
HTML project/Basic HTML Knowledge

STEP 1: Create S3 bucket on AWS

Open the AWS console and log in.

Search for S3 in the search bar and click on create bucket

Enter a unique bucket name
Disable block public access
Leave everything else as it is and create bucket

STEP 2: Upload your files

After uploading the files

STEP 3: Enable static website hosting

Open bucket properties

Scroll all the way to the end and enable Static Website Hosting.

Save after all changes have been made.

STEP 4: Update Bucket policy

Navigate to the permissions tab and update the bucket policy with the permissions below. Don't forget to save changes.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PublicReadGetObject",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::YOUR-BUCKET-NAME/*"
    }
  ]
}

STEP 5: Access your website

Navigate to the properties tab
scroll all the way to the end
Access your website using the Bucket Website Endpoint

Closing Remark!
You have successfully hosted your website on AWS using an S3 bucket. If you encounter any problems, kindly review what you've done and ensure you haven't missed any steps.

Thank you for reading to the end. Kindly reach out to me in the comment section if you have any questions, or on LinkedIn.

Till next time, cheers.

Hardening Kubernetes: A Practical Guide to EKS Security with Terraform and Kyverno

V-ris Jaijongrak — Wed, 29 Apr 2026 15:04:05 +0000

In this post, we will explore how to secure an Amazon EKS cluster by applying infrastructure-as-code best practices and policy-driven guardrails. We will use Terraform to provision our infrastructure and Kyverno to enforce security policies at the cluster level.

1. The Foundation: Infrastructure as Code

To minimize our attack surface, we will deploy a private EKS cluster. The control plane will be inaccessible from the public internet, forcing all management traffic through a secure VPN tunnel.

Our Terraform setup includes:

VPC Networking: A /16 VPC with three /24 private subnets and one public subnet for ingress.
Bastion-OpenVPN: A Terraform module to provide a secure gateway into our private environment.
EKS NodeGroups: Managed worker nodes with defined instance types.

Note: This setup is for demonstration. For production-grade architectures, always refer to aws-ia to align with AWS best practices.

2. Establishing Secure Access

Because the EKS API server resides in a private subnet, we cannot reach it directly from our local machine. We use the Bastion host as an intermediary.

Connecting via OpenVPN:

Generate Credentials: Access your bastion host and run: sudo /usr/local/bin/generate-client-cert.sh <client-name>.
Retrieve Config: Pull the generated .ovpn file from S3: aws s3 cp s3://<bucket-name>/clients/<client-name>.ovpn .
Configure Routing: Update your .ovpn file to include the route to your VPC CIDR:

route <VPC-CIDR> <SUBNET-MASK>

4. Connect: Run sudo openvpn --config <client-name>.ovpn.

Once the tunnel is active, you can interact with the cluster via kubectl:

aws eks update-kubeconfig --name <CLUSTER_NAME>
kubectl get nodes

The result should look similar to this:

NAME                                              STATUS   ROLES    AGE   VERSION
ip-172-xx-yy-zzz.aws-region.compute.internal      Ready    <none>   21h   v1.34.4-eks-f69f56f

3. Policy-as-Code with Kyverno

Infrastructure security is only half the battle. We also need guardrails for the workloads running inside the cluster. Kyverno allows us to manage these policies as Kubernetes objects.

Installing the Policy Suite

We will deploy Kyverno and the policy-reporter for a centralized security dashboard:

# Install Kyverno
helm repo add kyverno https://kyverno.github.io/kyverno/
helm install kyverno --namespace kyverno --create-namespace kyverno/kyverno

# Install Policy Reporter
helm install policy-reporter policy-reporter/policy-reporter \
  --create-namespace --namespace policy-reporter \
  --set ui.enabled=true --set kyvernoPlugin.enabled=true

Testing Guardrails

Kyverno operates in two primary modes:

Enforce: Automatically modifies incoming requests (e.g., adding security contexts) to comply with security standards.
Audit: Monitors and reports policy violations without necessarily blocking the workload.

Example: Enforcing PSS (Pod Security Standards)

If we apply a mutate policy that enforces a "Restricted" security context, an Nginx pod might fail if it attempts to run as root.

Mutation: When we apply the PSS Restricted policy, our Nginx pod may enter a CrashLoopBackOff because it violates the enforced security constraints. A more compatible container, like busybox, will run successfully.
Audit: By using validationFailureAction: Audit, we can track non-compliant pods without breaking existing applications. This is the recommended strategy when rolling out security policies to existing production clusters.

4.Next Steps: Observability

Security is an ongoing process. To keep your cluster healthy and secure, implement observability using AWS-native tools like Amazon Managed Service for Prometheus (AMP) and AWS Distro for OpenTelemetry (ADOT).

Check out the terraform-aws-observability-accelerator to get started.

Final Reminder: You can find the full source code for this demonstration in my GitHub repository. Don't forget to run terraform destroy when you are finished to avoid unnecessary AWS costs!

Appendix

To get the policy-report-ui dashboard

run kubectl port-forward service/policy-reporter-ui 8082:8080 -n policy-reporter
access from the browser via http://localhost:8082.

Mutate policy example taken from Kyverno

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: apply-pss-restricted-profile
  annotations:
    policies.kyverno.io/title: Apply PSS Restricted Profile
    policies.kyverno.io/category: Other, PSP Migration
    kyverno.io/kyverno-version: 1.6.2
    kyverno.io/kubernetes-version: "1.23"
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/description: Pod Security Standards define the fields and their options which are allowable for Pods to achieve certain security best practices. While these are typically validation policies, workloads will either be accepted or rejected based upon what has already been defined. It is also possible to mutate incoming Pods to achieve the desired PSS level rather than reject. This policy sets all the fields necessary to pass the PSS Restricted profile. Note that it does not attempt to remove non-compliant volumes and volumeMounts. Additional policies may be employed for this purpose.
spec:
  rules:
    - name: add-pss-fields
      match:
        any:
          - resources:
              kinds:
                - Pod
      mutate:
        patchStrategicMerge:
          spec:
            securityContext:
              seccompProfile:
                type: RuntimeDefault
              runAsNonRoot: true
              runAsUser: 1000
              runAsGroup: 3000
              fsGroup: 2000
            containers:
              - (name): "?*"
                securityContext:
                  privileged: false
                  capabilities:
                    drop:
                      - ALL
                  allowPrivilegeEscalation: false

nginx pod yaml

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: nginx
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

busybox pod yaml

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: busybox-0
  name: busybox-0
spec:
  containers:
  - command:
    - sleep
    - "3600"
    image: busybox
    name: busybox-0
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

validate policy example taken from Kyverno

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: pss-audit
spec:
  validationFailureAction: Audit
  background: true
  rules:
    - name: check-run-as-non-root
      match:
        resources:
          kinds:
            - Pod
      validate:
        message: "Running as root is not allowed"
        pattern:
          spec:
            securityContext:
              runAsNonRoot: true

busybox pod complying with validate policy

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: busybox-1
  name: busybox-1
spec:
  containers:
  - command:
    - sleep
    - "3600"
    image: busybox
    name: busybox-1
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

AWS Amplify Cache Is Useless — And Here Is the Data to Prove It

Tanseer — Wed, 29 Apr 2026 06:47:04 +0000

Who This Is For

If you are deploying a frontend or full-stack app on AWS Amplify and your builds feel slower than they should be, this blog is worth reading. We are going to talk about Amplify's caching system — what it is supposed to do, what it actually does, and why in my experience it makes things worse, not better.

No deep AWS knowledge is required. If you know what a build pipeline is and have used Amplify at least once, you will follow this completely.

The Problem I Ran Into

I was deploying an app on AWS Amplify. The build had two phases: install packages and build the app. Pretty standard setup.

The total build time was sitting at around 9 minutes. That felt too long. So I opened the build logs and started looking at where the time was actually going.

Here is what I found:

3 minutes to restore cache (fetch previously stored files)
3 minutes to install packages and build the app
3 minutes to save cache (store files for the next build) So out of 9 minutes, the actual work — installing and building — was only 3 minutes. The other 6 minutes were spent entirely on cache operations.

That immediately felt wrong. Cache is supposed to speed things up. If it is consuming twice the time of the actual build, something is broken.

The First Experiment: Disable Cache Entirely

My first instinct was simple. What if I just removed the cache configuration completely and let Amplify install everything fresh every time?

I removed the cache settings from my amplify.yml build config and triggered a new build.

The result: 3 minutes and 30 seconds.

The build went from 9 minutes to 3 minutes 30 seconds just by removing cache. Yes, it took an extra 30 seconds to download packages compared to the ideal cached scenario. But it saved 6 full minutes of cache overhead.

This alone should raise a flag. The cache was not saving time. It was adding time.

What Is Amplify Cache, Exactly?

Before going further, let me explain how Amplify's caching works, because understanding the mechanism is key to understanding why it fails.

When Amplify runs a build, it can be configured to save certain folders — most commonly node_modules — by zipping them up and storing them in S3 (AWS's file storage service). On the next build, it fetches that zip, unzips it into the build environment, and in theory your packages are already there so the install step is faster.

The key operation here is: zip and upload after a build, download and unzip before the next build.

This is how Amplify's cache model works. It is essentially just copying folders in and out of storage between builds.

The Second Experiment: Maybe It Is My Project

After the first result, I thought maybe the problem was specific to my project. I had a reasonably large dependency tree. Maybe the node_modules folder was so big that zipping and unzipping it was always going to take longer than just reinstalling.

So I created a minimal test project — a simple website with almost no packages. Just enough to have a package.json and a basic build step. The kind of project where node_modules is tiny and cache should be trivially fast.

I deployed it on Amplify with cache enabled.

Same result. Amplify spent time fetching the cache, and then installed all dependencies from scratch anyway. The cache folder it had stored from the previous build was essentially ignored from a practical standpoint.

The Root Cause: Cache and npm Are Fundamentally Incompatible

After these experiments, I did some digging and found the real reason this does not work. It comes down to how npm (the package manager) behaves versus how Amplify's cache model works.

Amplify caches folders. That is it. It saves a folder, restores a folder.

But here is the problem:

If you use npm ci (which is the recommended command for CI/CD pipelines because it gives you clean, reproducible installs), it deletes node_modules entirely before installing. Every single time. It does not matter that Amplify just spent 3 minutes restoring that folder. npm ci will delete it and start over.

If you use npm install (the more common development command), it does not always delete node_modules, but it re-evaluates the dependency tree and may reinstall or update packages depending on what it finds. So even here, the cache is not reliably used.

In both cases, the cached node_modules folder is either deleted outright or partially ignored.

Amplify's own documentation recommends using npm ci for builds. But npm ci by design destroys exactly what Amplify's cache tries to preserve. These two things directly contradict each other.

The cache model and the install command are working against each other.

A Simple Way to Think About It

Imagine you spend 10 minutes carefully organizing your desk every night before bed so it is ready for tomorrow. But every morning, the first thing you do is clear everything off the desk and start fresh. The organizing you did the night before is completely wasted.

That is exactly what is happening here. Amplify organizes the node_modules folder into cache. npm wipes the desk clean every build.

What the Numbers Look Like Side by Side

To make this concrete, here is a comparison of what I observed:

With cache enabled:

Restore cache: ~3 minutes
Install and build: ~3 minutes
Save cache: ~3 minutes
Total: ~9 minutes With cache disabled:
Install and build: ~3 minutes 30 seconds
Total: ~3 minutes 30 seconds The "optimized" build with cache took more than twice as long as the build with no cache at all.

What You Should Do Instead

Based on everything above, my recommendation is straightforward: disable Amplify cache unless you have a very specific reason to use it and have verified it is actually helping.

To disable it, remove or empty the cache section from your amplify.yml. Here is what a build config without cache looks like:

version: 1
frontend:
  phases:
    preBuild:
      commands:
        - npm ci
    build:
      commands:
        - npm run build
  artifacts:
    baseDirectory: build
    files:
      - '**/*'

No cache block. Clean and simple.

If your builds are still slow after removing cache, the bottleneck is likely somewhere else — large dependencies, slow build tools, or the build machine itself. Those are worth investigating separately, but at least you will not be wasting time on a cache that is not working.

My Conclusion

AWS Amplify's cache feature is built on a model that zips and unzips folders between builds. That model does not account for how npm actually works. npm ci deletes node_modules before every install. npm install may partially reinstall anyway. The result is that the cache restore step costs real time — in my case, 3 minutes per build — and delivers no actual benefit.

I tested this on a large app and a minimal app. I tried npm ci and npm install. I made sure cache folders were correctly configured and permissions were in place. In every scenario, disabling cache made builds faster.

This feels like a fundamental design mismatch between Amplify's caching mechanism and how modern package managers work.

Has This Happened to You?

I am genuinely curious whether other developers have experienced this. Have you found a way to make Amplify cache actually work? Did you measure a real improvement? Or did you hit the same wall?

Drop a comment or reach out — I would love to hear if someone has cracked this or if this is a widely shared frustration in the community.

Need Help With Your Amplify Setup?

If you are running into build time issues or anything else with your Amplify deployment, feel free to reach out. Happy to help.

Email me at khantanseer43@gmail.com