<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: AWS Community Builders </title>
    <description>The latest articles on Forem by AWS Community Builders  (@aws-builders).</description>
    <link>https://forem.com/aws-builders</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F2794%2F88da75b6-aadd-4ea1-8083-ae2dfca8be94.png</url>
      <title>Forem: AWS Community Builders </title>
      <link>https://forem.com/aws-builders</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/aws-builders"/>
    <language>en</language>
    <item>
      <title>S3 Files: The End of Download-Process-Upload (with Terraform)</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Fri, 01 May 2026 15:28:59 +0000</pubDate>
      <link>https://forem.com/aws-builders/s3-files-the-end-of-download-process-upload-with-terraform-17ln</link>
      <guid>https://forem.com/aws-builders/s3-files-the-end-of-download-process-upload-with-terraform-17ln</guid>
      <description>&lt;p&gt;On April 7, 2026, AWS launched S3 Files - a managed NFS v4.1/4.2 layer built on Amazon EFS that provides file-system semantics on top of S3, including read-after-write consistency, advisory file locking, and POSIX permissions. (AWS Storage Gateway's File Gateway has offered NFS-over-S3 for years, but as a caching gateway appliance, not a native file system with these guarantees.) You can mount S3 Files from EC2, Lambda, EKS, and ECS (Fargate and ECS Managed Instances launch types; EC2 launch type is not yet supported). Your code reads and writes files with &lt;code&gt;open()&lt;/code&gt;, &lt;code&gt;os.rename()&lt;/code&gt;, and &lt;code&gt;os.listdir()&lt;/code&gt;. No boto3 for the data path. No /tmp juggling. No copy-then-delete to simulate a rename.&lt;/p&gt;

&lt;p&gt;In this post, I'll build two identical document-processing Lambda functions - one using the traditional S3 API approach and one using S3 Files - deploy them with Terraform, and benchmark the difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Long Road to a Real S3 File System
&lt;/h2&gt;

&lt;p&gt;For nearly two decades, developers have been trying to use S3 as a file system. Here's how the tools evolved:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;s3fs-fuse (2010)&lt;/th&gt;
&lt;th&gt;Mountpoint for S3 (2023)&lt;/th&gt;
&lt;th&gt;S3 Files (2026)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Protocol&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;FUSE&lt;/td&gt;
&lt;td&gt;FUSE&lt;/td&gt;
&lt;td&gt;NFS 4.1/4.2 (managed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Write support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full (but slow)&lt;/td&gt;
&lt;td&gt;Sequential/append only&lt;/td&gt;
&lt;td&gt;Full read/write&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rename&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Copy + delete (slow)&lt;/td&gt;
&lt;td&gt;Not supported&lt;/td&gt;
&lt;td&gt;Instant from the NFS client's perspective (async S3 sync)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;File locking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Advisory locks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consistency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Eventual&lt;/td&gt;
&lt;td&gt;Eventual&lt;/td&gt;
&lt;td&gt;Read-after-write&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dir listing (1000 files)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Slow&lt;/td&gt;
&lt;td&gt;163ms&lt;/td&gt;
&lt;td&gt;39ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Small file reads (1000 files)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Very slow&lt;/td&gt;
&lt;td&gt;87.1s&lt;/td&gt;
&lt;td&gt;4.3s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sequential write (100MB)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~100 MB/s&lt;/td&gt;
&lt;td&gt;I/O errors&lt;/td&gt;
&lt;td&gt;273 MB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS managed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No (community)&lt;/td&gt;
&lt;td&gt;Client only&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max throughput&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~100 MB/s&lt;/td&gt;
&lt;td&gt;GB/s&lt;/td&gt;
&lt;td&gt;TB/s aggregate&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Performance figures from published launch-day benchmarks; see&lt;/em&gt; &lt;a href="https://computingforgeeks.com/s3-files-vs-mountpoint-vs-s3fs/" rel="noopener noreferrer"&gt;S3 Files vs Mountpoint vs s3fs-fuse comparison&lt;/a&gt; &lt;em&gt;and&lt;/em&gt; &lt;a href="https://dev.classmethod.jp/en/articles/amazon-s3-files-ga-mount-and-compare-efs/" rel="noopener noreferrer"&gt;DevelopersIO GA walkthrough&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Each generation solved the previous one's biggest limitation. s3fs-fuse gave you a file system but was slow and unreliable. Mountpoint gave you speed but restricted writes to append-only - ruling out most real applications. S3 Files closes the remaining gaps: file-system semantics including advisory file locking and POSIX permissions, managed infrastructure, and strong performance for both small and large file workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Before" Pattern: Download-Process-Upload
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr5v58cby0ggdgovdkkxk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr5v58cby0ggdgovdkkxk.png" alt="Before architecture - Traditional S3 API pattern" width="800" height="567"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you've written a Lambda function that processes files in S3, you've written this pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. List files in the inbox
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_objects_v2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inbox/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Contents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;removeprefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inbox/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# 2. Download to /tmp (the only writable space Lambda gives you)
&lt;/span&gt;        &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;download_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/tmp/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# 3. Process the file
&lt;/span&gt;        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/tmp/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;word_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lines&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

        &lt;span class="c1"&gt;# 4. Upload processed file (S3 has no rename - copy then delete)
&lt;/span&gt;        &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;CopySource&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processed/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# 5. Upload metadata
&lt;/span&gt;        &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processed/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.meta.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# 6. Delete the original
&lt;/span&gt;        &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete_object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# 7. Clean up /tmp (Lambda reuses containers)
&lt;/span&gt;        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/tmp/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every step is an S3 API call. Every file passes through /tmp. "Renaming" a file requires a full copy followed by a delete - two API calls for something that should be instant. If your function processes 100 files, that's hundreds of API calls, each adding latency.&lt;/p&gt;

&lt;p&gt;And /tmp itself is limited. Lambda gives you 512MB by default (up to 10GB at extra cost). If you're processing large files or many files concurrently, you'll hit that ceiling.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "After" Pattern: Just Use the File System
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzszxk8vvg9gptorbv5gw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzszxk8vvg9gptorbv5gw.png" alt="After architecture - S3 Files mounted file system" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With S3 Files mounted at &lt;code&gt;/mnt/docs&lt;/code&gt;, the same logic becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Tracer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Metrics&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MetricUnit&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Tracer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Metrics&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;MOUNT_PATH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MOUNT_PATH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nd"&gt;@logger.inject_lambda_context&lt;/span&gt;
&lt;span class="nd"&gt;@tracer.capture_lambda_handler&lt;/span&gt;
&lt;span class="nd"&gt;@metrics.log_metrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;capture_cold_start_metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;inbox&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MOUNT_PATH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inbox&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;processed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MOUNT_PATH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;makedirs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inbox&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inbox&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;word_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lines&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.meta.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FilesProcessed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MetricUnit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;batch complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no boto3 import, no /tmp management, and no copy-then-delete dance. Powertools makes it easy to add &lt;a href="https://darryl-ruggles.cloud/powertools-for-aws-lambda-best-practices-by-default" rel="noopener noreferrer"&gt;structured logging, tracing, and EMF metrics&lt;/a&gt; - the decorators above wire all three into this handler. The rename returns instantly from the NFS client's perspective. The code is materially shorter and maps more directly to the workload.&lt;/p&gt;

&lt;p&gt;One caveat: "instant" means instant from &lt;em&gt;your code's perspective&lt;/em&gt;. Under the hood, S3 Files still has to copy + delete the S3 object to implement the rename - general-purpose S3 buckets have no native rename operation. (S3 Express One Zone directory buckets do have a &lt;code&gt;RenameObject&lt;/code&gt; API, but S3 Files works with general-purpose buckets.) For single files, this happens fast enough to be invisible. For directory renames across thousands of objects, the S3-side sync can take a long time - AWS documentation warns about performance impact for large recursive rename operations on prefixes with many objects. Your NFS client sees the rename as complete immediately, but S3 API consumers see the old key until the background sync finishes.&lt;/p&gt;

&lt;p&gt;This is not just cleaner code - it's a fundamentally different model. S3 remains the authoritative data store; the file system is a synchronized view. Your Lambda function sees files and directories. S3 sees objects and prefixes. Both are looking at the same data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Infrastructure with Terraform
&lt;/h2&gt;

&lt;p&gt;Good news: the Terraform AWS provider shipped native S3 Files resources in &lt;a href="https://github.com/hashicorp/terraform-provider-aws/releases" rel="noopener noreferrer"&gt;v6.40.0&lt;/a&gt; on April 8, 2026 - just one day after S3 Files went GA. The new resources are &lt;code&gt;aws_s3files_file_system&lt;/code&gt;, &lt;code&gt;aws_s3files_mount_target&lt;/code&gt;, and &lt;code&gt;aws_s3files_access_point&lt;/code&gt;, plus corresponding data sources and &lt;code&gt;aws_s3files_file_system_policy&lt;/code&gt; for resource-based policies.&lt;/p&gt;

&lt;h3&gt;
  
  
  The S3 Bucket (Versioning is Mandatory)
&lt;/h3&gt;

&lt;p&gt;S3 Files requires bucket versioning to be enabled. This is how it tracks the relationship between file-system state and S3 object versions. The full bucket setup also includes SSE-S3 encryption (explicit, even though it's the default for new buckets), a public access block, and a bucket policy enforcing TLS-only access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket"&lt;/span&gt; &lt;span class="s2"&gt;"docs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-docs"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket_versioning"&lt;/span&gt; &lt;span class="s2"&gt;"docs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;versioning_configuration&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Enabled"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket_server_side_encryption_configuration"&lt;/span&gt; &lt;span class="s2"&gt;"docs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;rule&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;apply_server_side_encryption_by_default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;sse_algorithm&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"AES256"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket_public_access_block"&lt;/span&gt; &lt;span class="s2"&gt;"docs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;block_public_acls&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;block_public_policy&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;ignore_public_acls&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;restrict_public_buckets&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Disable ACLs - bucket-owner-enforced is the default for new buckets, but&lt;/span&gt;
&lt;span class="c1"&gt;# being explicit prevents readers from relying on defaults they may not understand.&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket_ownership_controls"&lt;/span&gt; &lt;span class="s2"&gt;"docs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;rule&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;object_ownership&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"BucketOwnerEnforced"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Versioning is mandatory for S3 Files, so without lifecycle cleanup old&lt;/span&gt;
&lt;span class="c1"&gt;# versions accumulate silently during repeated benchmark runs.&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket_lifecycle_configuration"&lt;/span&gt; &lt;span class="s2"&gt;"docs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;rule&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"expire-noncurrent-versions"&lt;/span&gt;
    &lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Enabled"&lt;/span&gt;
    &lt;span class="nx"&gt;noncurrent_version_expiration&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;noncurrent_days&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket_policy"&lt;/span&gt; &lt;span class="s2"&gt;"docs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Sid&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DenyNonTLS"&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Deny"&lt;/span&gt;
      &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"*"&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"s3:*"&lt;/span&gt;
      &lt;span class="nx"&gt;Resource&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"${aws_s3_bucket.docs.arn}/*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;Condition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Bool&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"aws:SecureTransport"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"false"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production hardening notes:&lt;/strong&gt; For workloads that should stay private inside the VPC, you can go beyond TLS-only and restrict bucket access to your S3 VPC endpoint using &lt;code&gt;aws:sourceVpce&lt;/code&gt; or &lt;code&gt;aws:sourceVpc&lt;/code&gt; conditions in the bucket policy. This can prevent bucket access except through your approved VPC or VPC endpoint, even when credentials are otherwise valid. For SSE-KMS encrypted buckets, the S3 Files service role would also need &lt;code&gt;kms:GenerateDataKey&lt;/code&gt;, &lt;code&gt;kms:Encrypt&lt;/code&gt;, &lt;code&gt;kms:Decrypt&lt;/code&gt;, &lt;code&gt;kms:ReEncryptFrom&lt;/code&gt;, and &lt;code&gt;kms:ReEncryptTo&lt;/code&gt; scoped with &lt;code&gt;kms:ViaService = s3.&amp;lt;region&amp;gt;.amazonaws.com&lt;/code&gt;. This demo uses SSE-S3 (AES256), so KMS permissions are not needed here.&lt;/p&gt;

&lt;h3&gt;
  
  
  The S3 Files Service Role
&lt;/h3&gt;

&lt;p&gt;S3 Files needs an IAM role it can assume to read and write your bucket. This is separate from your Lambda's execution role. First surprise: &lt;strong&gt;the service principal is &lt;code&gt;elasticfilesystem.amazonaws.com&lt;/code&gt;, not &lt;code&gt;s3files.amazonaws.com&lt;/code&gt;&lt;/strong&gt;. S3 Files is built on EFS, and the trust policy has to name the underlying service. If you guess the obvious name, &lt;code&gt;CreateRole&lt;/code&gt; fails with &lt;code&gt;MalformedPolicyDocument: Invalid principal&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"s3files_service"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-s3files-service"&lt;/span&gt;

  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Sid&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"AllowS3FilesAssumeRole"&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Service&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"elasticfilesystem.amazonaws.com"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;
      &lt;span class="nx"&gt;Condition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;StringEquals&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="s2"&gt;"aws:SourceAccount"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_caller_identity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;account_id&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nx"&gt;ArnLike&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="s2"&gt;"aws:SourceArn"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:s3files:${data.aws_region.current.region}:${data.aws_caller_identity.current.account_id}:file-system/*"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"s3files_bucket_access"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"s3-bucket-access"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;s3files_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s2"&gt;"s3:ListBucket"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"s3:ListBucketVersions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"s3:GetBucketLocation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"s3:GetBucketVersioning"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"s3:AbortMultipartUpload"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"s3:ListMultipartUploadParts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"s3:GetObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"s3:GetObjectVersion"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"s3:GetObjectTagging"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"s3:GetObjectVersionTagging"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"s3:PutObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"s3:PutObjectTagging"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"s3:DeleteObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"s3:DeleteObjectVersion"&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"${aws_s3_bucket.docs.arn}/*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;Condition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;StringEquals&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="s2"&gt;"aws:ResourceAccount"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_caller_identity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;account_id&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The role also needs &lt;strong&gt;EventBridge permissions&lt;/strong&gt; - this is the mechanism behind S3-to-NFS synchronization. S3 Files creates EventBridge rules (prefixed &lt;code&gt;DO-NOT-DELETE-S3-Files*&lt;/code&gt;) to detect out-of-band bucket changes. Without these, S3-side writes never propagate to the NFS mount:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"s3files_eventbridge"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"eventbridge-sync"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;s3files_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Sid&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"EventBridgeManage"&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="s2"&gt;"events:PutRule"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"events:PutTargets"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s2"&gt;"events:DeleteRule"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"events:DisableRule"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s2"&gt;"events:EnableRule"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"events:RemoveTargets"&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:events:*:*:rule/DO-NOT-DELETE-S3-Files*"&lt;/span&gt;
        &lt;span class="nx"&gt;Condition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;StringEquals&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s2"&gt;"events:ManagedBy"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"elasticfilesystem.amazonaws.com"&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Sid&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"EventBridgeRead"&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="s2"&gt;"events:DescribeRule"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"events:ListRules"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s2"&gt;"events:ListRuleNamesByTarget"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"events:ListTargetsByRule"&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:events:*:*:rule/*"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use SSE-KMS encryption on the bucket, you'd also need &lt;code&gt;kms:GenerateDataKey&lt;/code&gt;, &lt;code&gt;kms:Encrypt&lt;/code&gt;, &lt;code&gt;kms:Decrypt&lt;/code&gt;, &lt;code&gt;kms:ReEncryptFrom&lt;/code&gt;, and &lt;code&gt;kms:ReEncryptTo&lt;/code&gt; scoped with &lt;code&gt;kms:ViaService = s3.&amp;lt;region&amp;gt;.amazonaws.com&lt;/code&gt;. This demo uses SSE-S3 (AES256), so KMS permissions aren't needed.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;aws:SourceArn&lt;/code&gt; condition and the full set of object/multipart/EventBridge actions are documented in the &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-prereq-policies.html" rel="noopener noreferrer"&gt;S3 Files prerequisites&lt;/a&gt;. The biggest risk from an incomplete policy isn't a permission error - it's silent failure. Missing EventBridge permissions mean the sync rules never get created, and S3-side changes simply don't appear on the mount. Missing multipart permissions cause large-file uploads to leak incomplete parts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating the File System, Mount Targets, and Access Point
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;aws_s3files_file_system&lt;/code&gt; resource takes just a bucket ARN and the service role ARN:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3files_file_system"&lt;/span&gt; &lt;span class="s2"&gt;"docs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
  &lt;span class="nx"&gt;role_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;s3files_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mount targets go in each subnet where your Lambda runs. One per AZ:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3files_mount_target"&lt;/span&gt; &lt;span class="s2"&gt;"az"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnet_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="nx"&gt;file_system_id&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3files_file_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnet_ids&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;security_groups&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mount_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mount targets take about 5 minutes to create. Terraform's create timeout handles that wait - but there's a trap: the provider returns once the API call completes, which happens &lt;em&gt;before&lt;/em&gt; the target reaches the &lt;code&gt;available&lt;/code&gt; lifecycle state. If you create a Lambda that references the access point immediately after, &lt;code&gt;CreateFunction&lt;/code&gt; fails with &lt;code&gt;not all are in the available life cycle state yet&lt;/code&gt;. The fix is an explicit wait between mount targets and downstream consumers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"time_sleep"&lt;/span&gt; &lt;span class="s2"&gt;"wait_for_mount_targets"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;depends_on&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_s3files_mount_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;az&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;create_duration&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"90s"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3files_access_point"&lt;/span&gt; &lt;span class="s2"&gt;"lambda"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;file_system_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3files_file_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;depends_on&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;time_sleep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wait_for_mount_targets&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="c1"&gt;# DEMO SHORTCUT: uid 0:0 avoids ownership collisions during the side-by-side&lt;/span&gt;
  &lt;span class="c1"&gt;# comparison. In production, prefer a scoped access point path with a non-root&lt;/span&gt;
  &lt;span class="c1"&gt;# UID/GID (e.g., uid=1000), or grant s3files:ClientRootAccess on the Lambda&lt;/span&gt;
  &lt;span class="c1"&gt;# role instead. AWS's Lambda console defaults to UID/GID 1000:1000 with&lt;/span&gt;
  &lt;span class="c1"&gt;# root_directory.path = "/lambda" for good reason.&lt;/span&gt;
  &lt;span class="nx"&gt;posix_user&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;uid&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nx"&gt;gid&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;root_directory&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The access point controls the POSIX UID/GID that all NFS operations execute as. The choice of &lt;code&gt;0:0&lt;/code&gt; here is a demo compromise, not a recommendation - I'll explain the tradeoffs and better alternatives in the "Things to Look Out For" section.&lt;/p&gt;

&lt;p&gt;Finally, add an &lt;code&gt;aws_s3files_file_system_policy&lt;/code&gt; - the resource-based policy on the file system itself (equivalent to a bucket policy). Without this, any principal with &lt;code&gt;s3files:ClientMount&lt;/code&gt; in their IAM policy can mount your file system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3files_file_system_policy"&lt;/span&gt; &lt;span class="s2"&gt;"docs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;file_system_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3files_file_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Sid&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"AllowMountFromKnownRoles"&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
        &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;AWS&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambda_role_arn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ec2_role_arn&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"s3files:ClientMount"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"s3files:ClientWrite"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3files_file_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Sid&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"EnforceTLS"&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Deny"&lt;/span&gt;
        &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"*"&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"s3files:*"&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3files_file_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
        &lt;span class="nx"&gt;Condition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Bool&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"aws:SecureTransport"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"false"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The VPC (No NAT Gateway Needed)
&lt;/h3&gt;

&lt;p&gt;S3 Files requires your Lambda to be in a VPC - the NFS mount targets live inside your VPC subnets. But you don't need a NAT Gateway (which costs about $35/month). Instead, use a free S3 Gateway VPC endpoint for S3 API traffic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;cidr_block&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.0.0.0/16"&lt;/span&gt;
  &lt;span class="nx"&gt;enable_dns_support&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;   &lt;span class="c1"&gt;# Required for VPC endpoints&lt;/span&gt;
  &lt;span class="nx"&gt;enable_dns_hostnames&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_subnet"&lt;/span&gt; &lt;span class="s2"&gt;"private"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;count&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;cidr_block&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnet_cidrs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;availability_zone&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;availability_zones&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Free S3 Gateway endpoint - no NAT gateway needed&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc_endpoint"&lt;/span&gt; &lt;span class="s2"&gt;"s3"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;service_name&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"com.amazonaws.${data.aws_region.current.region}.s3"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_endpoint_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Gateway"&lt;/span&gt;
  &lt;span class="nx"&gt;route_table_ids&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_route_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Security groups allow NFS traffic (TCP 2049) between the Lambda and mount targets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Lambda can reach mount targets on NFS port&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc_security_group_egress_rule"&lt;/span&gt; &lt;span class="s2"&gt;"lambda_to_nfs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;security_group_id&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambda_after&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;referenced_security_group_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mount_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;from_port&lt;/span&gt;                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2049&lt;/span&gt;
  &lt;span class="nx"&gt;to_port&lt;/span&gt;                      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2049&lt;/span&gt;
  &lt;span class="nx"&gt;ip_protocol&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Mount targets accept NFS from Lambda&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc_security_group_ingress_rule"&lt;/span&gt; &lt;span class="s2"&gt;"nfs_from_lambda"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;security_group_id&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mount_target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;referenced_security_group_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambda_after&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;from_port&lt;/span&gt;                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2049&lt;/span&gt;
  &lt;span class="nx"&gt;to_port&lt;/span&gt;                      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2049&lt;/span&gt;
  &lt;span class="nx"&gt;ip_protocol&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Lambda Function with S3 Files Mount
&lt;/h3&gt;

&lt;p&gt;The Lambda configuration uses the same &lt;code&gt;file_system_config&lt;/code&gt; block as EFS. The key additions are the VPC config and the S3 Files-specific IAM permissions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lambda_function"&lt;/span&gt; &lt;span class="s2"&gt;"processor_after"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;function_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-after"&lt;/span&gt;
  &lt;span class="nx"&gt;runtime&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"python3.14"&lt;/span&gt;
  &lt;span class="nx"&gt;architectures&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"arm64"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;memory_size&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;    &lt;span class="c1"&gt;# &amp;gt;= 512MB enables direct S3 read optimization&lt;/span&gt;
  &lt;span class="nx"&gt;timeout&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
  &lt;span class="nx"&gt;handler&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"handler.lambda_handler"&lt;/span&gt;

  &lt;span class="nx"&gt;vpc_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;subnet_ids&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private_subnet_ids&lt;/span&gt;
    &lt;span class="nx"&gt;security_group_ids&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambda_sg_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;file_system_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;arn&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;access_point_arn&lt;/span&gt;    &lt;span class="c1"&gt;# S3 Files access point&lt;/span&gt;
    &lt;span class="nx"&gt;local_mount_path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/mnt/docs"&lt;/span&gt;            &lt;span class="c1"&gt;# Must start with /mnt/&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;variables&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;MOUNT_PATH&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/mnt/docs"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Lambda execution role needs S3 Files mount permissions &lt;strong&gt;and&lt;/strong&gt; S3 read permissions for the direct-read optimization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"s3files_mount"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"s3files-mount"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"s3files:ClientMount"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"s3files:ClientWrite"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;access_point_arn&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Required for the &amp;gt;=1 MiB direct-read bypass (streams from S3 at up to 3 GB/s).&lt;/span&gt;
&lt;span class="c1"&gt;# Without this, reads silently fall back to the cached path - the mount works&lt;/span&gt;
&lt;span class="c1"&gt;# but you lose the throughput optimization and pay S3 Files access charges.&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"s3_direct_read"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"s3-direct-read"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"s3:GetObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"s3:GetObjectVersion"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.bucket_arn}/*"&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: &lt;code&gt;s3files:ClientMount&lt;/code&gt; is required for all access. &lt;code&gt;s3files:ClientWrite&lt;/code&gt; is only needed for read-write mounts. &lt;code&gt;s3files:ClientRootAccess&lt;/code&gt; lets a non-root access point UID operate on root-owned entries (see the Access Point Ownership section below - it's the cleanest fix for mixed S3-API/NFS workflows). The &lt;code&gt;s3:GetObject&lt;/code&gt;/&lt;code&gt;s3:GetObjectVersion&lt;/code&gt; permissions are technically optional, but without them the direct-read bypass doesn't activate and your &amp;gt;=512MB memory setting buys you nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Comparison
&lt;/h2&gt;

&lt;p&gt;I deployed both Lambda functions and ran them against 20 medium-sized text files (500-2000 words each) across 3 runs. The benchmark script seeds each approach into its own S3 prefix, invokes the relevant Lambda, and collects timing breakdowns from both handlers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make benchmark
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Actual results from a 20-file, 3-run benchmark (arm64 Lambda, 512MB, us-east-1):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before (S3 API)&lt;/th&gt;
&lt;th&gt;After (S3 Files)&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;List files&lt;/td&gt;
&lt;td&gt;80ms (min 74, max 92)&lt;/td&gt;
&lt;td&gt;152ms (min 8, max 440)&lt;/td&gt;
&lt;td&gt;0.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read/Download (20 files)&lt;/td&gt;
&lt;td&gt;991ms (min 920, max 1096)&lt;/td&gt;
&lt;td&gt;139ms (min 124, max 146)&lt;/td&gt;
&lt;td&gt;7.1x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Process&lt;/td&gt;
&lt;td&gt;5ms&lt;/td&gt;
&lt;td&gt;3ms&lt;/td&gt;
&lt;td&gt;1.7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write metadata (20 files)&lt;/td&gt;
&lt;td&gt;1788ms (min 1574, max 2054)&lt;/td&gt;
&lt;td&gt;256ms (min 249, max 266)&lt;/td&gt;
&lt;td&gt;7.0x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Move/rename (20 files)&lt;/td&gt;
&lt;td&gt;530ms (min 465, max 647)&lt;/td&gt;
&lt;td&gt;114ms (min 112, max 116)&lt;/td&gt;
&lt;td&gt;4.6x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lambda total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3394ms (min 3037, max 3878)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;664ms (min 518, max 948)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.1x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Wall clock (including invoke)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3505ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1132ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.1x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Lambda-internal win is about 5x. Wall clock narrows the gap because the after-Lambda pays a VPC cold start penalty on its first invocation in each run (2.2s wall on run 1, ~600ms on runs 2-3 once the ENI is warm). For batch workloads you'd amortize that across many files; for sporadic triggers you'd feel it every time.&lt;/p&gt;

&lt;p&gt;The single non-win - list time - is counterintuitive but worth calling out. &lt;code&gt;os.listdir&lt;/code&gt; over NFS had a cold-run outlier of 440ms (vs ~80ms for a warm &lt;code&gt;ListObjectsV2&lt;/code&gt; call). I didn't chase this down, but it looks like metadata that hasn't been touched recently isn't in the S3 Files cache yet and needs to be hydrated from S3 on first access. After warmup, &lt;code&gt;listdir&lt;/code&gt; settles at 8ms - 10x faster than the S3 API.&lt;/p&gt;

&lt;p&gt;The biggest wins are in small file reads (no per-object HTTP round trip), writes (no multipart setup for small files), and rename (a single inode operation vs &lt;code&gt;CopyObject&lt;/code&gt; + &lt;code&gt;DeleteObject&lt;/code&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lambda Managed Instances Connection
&lt;/h2&gt;

&lt;p&gt;In my &lt;a href="https://darryl-ruggles.cloud/lambda-managed-instances-with-terraform-multi-concurrency-high-memory-and-compute-options" rel="noopener noreferrer"&gt;previous post on Lambda Managed Instances&lt;/a&gt;, I explored how high-memory Lambda functions unlock new workload patterns. S3 Files adds another dimension to this.&lt;/p&gt;

&lt;p&gt;When your Lambda function has &lt;strong&gt;512MB or more memory&lt;/strong&gt;, S3 Files enables direct S3 read routing: reads of &lt;strong&gt;1 MiB or larger&lt;/strong&gt; bypass the file system's high-performance storage entirely and stream directly from S3 at up to 3 GB/s per client (that's a throughput ceiling, not a typical number - actual throughput depends on file size, network, and concurrency). These direct reads don't incur S3 Files access charges - you only pay standard S3 GET pricing. (Your Lambda execution role needs &lt;code&gt;s3:GetObject&lt;/code&gt; and &lt;code&gt;s3:GetObjectVersion&lt;/code&gt; on the bucket for this to work - without them, reads silently fall back to the cached path.)&lt;/p&gt;

&lt;p&gt;There's a separate threshold at play too: files &lt;strong&gt;smaller than 128 KiB&lt;/strong&gt; are asynchronously imported into the high-performance storage on first access (a prefetch optimization, not a bypass). Files between 128 KiB and 1 MiB get metadata imported but data is fetched on demand. This creates a three-tier read architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tiny files (under 128 KiB)&lt;/strong&gt; (configs, metadata, indexes): prefetched into S3 Files cache, sub-millisecond on subsequent reads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mid-size files (128 KiB to 1 MiB)&lt;/strong&gt;: fetched on demand from the cache or S3, depending on access pattern&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large files (1 MiB and above)&lt;/strong&gt; (datasets, models, media): streamed directly from S3 at 3 GB/s, skipping the cache entirely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 128 KiB import threshold is tunable per file system via &lt;code&gt;aws_s3files_synchronization_configuration&lt;/code&gt; in Terraform (not shown in the demo). The 1 MiB direct-read bypass is not tunable.&lt;/p&gt;

&lt;p&gt;For data-intensive Lambda workloads, combining Managed Instances (multi-concurrency, high memory) with S3 Files (mounted file system, direct S3 read bypass) is a compelling alternative to containerized processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three-Way EC2 Comparison: S3 API vs S3 Files vs Mountpoint
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpy8hz3e26upt09n4hv1e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpy8hz3e26upt09n4hv1e.png" alt="EC2 three-way comparison architecture" width="800" height="852"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Lambda benchmark above covers the serverless use case, but it doesn't include &lt;a href="https://github.com/awslabs/mountpoint-s3" rel="noopener noreferrer"&gt;Mountpoint for Amazon S3&lt;/a&gt; - AWS's FUSE-based file-system client. Mountpoint is widely used for analytics and ML workloads, so it's a natural comparison. There's just one problem: &lt;strong&gt;Mountpoint can't run on Lambda&lt;/strong&gt;. It's FUSE-based, and Lambda's Firecracker microVM doesn't expose &lt;code&gt;/dev/fuse&lt;/code&gt; or grant &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt; - both required for userspace file-system mounts. S3 Files sidesteps this entirely by using NFS, which Lambda natively supports through its existing EFS mount infrastructure.&lt;/p&gt;

&lt;p&gt;So for the three-way comparison, I added a Graviton EC2 instance (&lt;code&gt;c7g.large&lt;/code&gt;, arm64, in the same VPC) with both S3 Files and Mountpoint mounted, plus direct S3 API access via boto3. Same bucket, same data, three different interfaces.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large-Directory Walk (10,000 Small Files)
&lt;/h3&gt;

&lt;p&gt;Seed 10,000 small text files under a single prefix, then enumerate every entry and stat each one:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Mean&lt;/th&gt;
&lt;th&gt;Min&lt;/th&gt;
&lt;th&gt;Max&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;S3 Files (NFS &lt;code&gt;os.listdir&lt;/code&gt; + &lt;code&gt;os.stat&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;905ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;891ms&lt;/td&gt;
&lt;td&gt;924ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 API (&lt;code&gt;ListObjectsV2&lt;/code&gt;, paginated)&lt;/td&gt;
&lt;td&gt;1,666ms&lt;/td&gt;
&lt;td&gt;1,637ms&lt;/td&gt;
&lt;td&gt;1,698ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mountpoint (FUSE &lt;code&gt;os.listdir&lt;/code&gt; + &lt;code&gt;os.stat&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;175,847ms&lt;/td&gt;
&lt;td&gt;171,168ms&lt;/td&gt;
&lt;td&gt;179,002ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;S3 Files is 1.8x faster than the S3 API. Mountpoint is &lt;strong&gt;194x slower&lt;/strong&gt; - nearly three minutes for 10,000 entries. This is Mountpoint's known weakness: it makes a &lt;code&gt;ListObjectsV2&lt;/code&gt; call per directory and then individual &lt;code&gt;HeadObject&lt;/code&gt; calls for &lt;code&gt;stat()&lt;/code&gt;, with no prefetching or metadata caching. If your workload involves browsing or enumerating directories, Mountpoint is the wrong tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large-File Throughput (5 x 1 GiB Random Binary)
&lt;/h3&gt;

&lt;p&gt;Seed five 1 GiB random binary files, stream-read each one into a SHA-256 hash, write the digest back:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Read time (5 GiB)&lt;/th&gt;
&lt;th&gt;Read throughput&lt;/th&gt;
&lt;th&gt;Write time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mountpoint (FUSE)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11,158ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;459 MiB/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,469ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 Files (NFS)&lt;/td&gt;
&lt;td&gt;32,356ms&lt;/td&gt;
&lt;td&gt;161 MiB/s&lt;/td&gt;
&lt;td&gt;71ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 API (&lt;code&gt;GetObject&lt;/code&gt; stream)&lt;/td&gt;
&lt;td&gt;129,228ms&lt;/td&gt;
&lt;td&gt;43 MiB/s&lt;/td&gt;
&lt;td&gt;151ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For large sequential reads, Mountpoint dominates at 459 MiB/s - nearly 3x S3 Files and 10x the S3 API. This isn't an accident: Mountpoint splits each large read into &lt;strong&gt;parallel HTTP Range GET requests&lt;/strong&gt; across multiple TCP connections, with aggressive read-ahead prefetching. A 1 GiB file read becomes many concurrent range fetches that saturate the network link. It's a purpose-built parallel download accelerator for large, sequential, read-heavy workloads (ML training data, analytics datasets, media processing).&lt;/p&gt;

&lt;p&gt;S3 Files (161 MiB/s) goes through NFS 4.1/4.2 to a managed server that reads from its cache or S3 - the protocol framing and cache coherency tracking add overhead. The S3 API (43 MiB/s) is a single &lt;code&gt;GetObject&lt;/code&gt; stream over one HTTP connection with no parallelism.&lt;/p&gt;

&lt;p&gt;The same design that makes Mountpoint fast for large reads makes it very slow for directories: it has &lt;strong&gt;no metadata cache&lt;/strong&gt;, so every &lt;code&gt;stat()&lt;/code&gt; call becomes an individual &lt;code&gt;HeadObject&lt;/code&gt; API call to S3. That's why 10,000 files takes 176 seconds.&lt;/p&gt;

&lt;p&gt;Write time tells a similar story: Mountpoint takes 1,469ms to write five small digest files. S3 Files does it in 71ms. S3 API in 151ms. Mountpoint's FUSE-to-S3 translation adds high per-file overhead for small writes.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to Use Which
&lt;/h3&gt;

&lt;p&gt;The benchmark reveals that no single approach wins everywhere:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Best tool&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Interactive file operations (rename, create, list)&lt;/td&gt;
&lt;td&gt;S3 Files&lt;/td&gt;
&lt;td&gt;File-system semantics, metadata caching, rename instant from the NFS client's perspective (S3-side sync is async)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large sequential reads (datasets, models, media)&lt;/td&gt;
&lt;td&gt;Mountpoint&lt;/td&gt;
&lt;td&gt;Highest throughput, zero software cost, no VPC needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Serverless (Lambda)&lt;/td&gt;
&lt;td&gt;S3 Files&lt;/td&gt;
&lt;td&gt;Mountpoint can't run on Lambda at all&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Simplest deployment (no VPC, no mounts)&lt;/td&gt;
&lt;td&gt;S3 API&lt;/td&gt;
&lt;td&gt;Slowest but zero infrastructure - works anywhere with IAM credentials&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Directory-heavy workloads&lt;/td&gt;
&lt;td&gt;S3 Files&lt;/td&gt;
&lt;td&gt;In this benchmark, Mountpoint's per-entry overhead made large directory walks much slower&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Things to Look Out For
&lt;/h2&gt;

&lt;p&gt;S3 Files is impressive, but it's not magic. Here are the real-world constraints you need to know:&lt;/p&gt;

&lt;h3&gt;
  
  
  60-Second Commit Delay
&lt;/h3&gt;

&lt;p&gt;S3 Files uses a "stage and commit" model. File-system writes are batched for approximately 60 seconds before committing to S3. Files you write are immediately visible through the NFS mount, but they won't appear in &lt;code&gt;aws s3 ls&lt;/code&gt; or &lt;code&gt;s3.list_objects_v2()&lt;/code&gt; for about a minute.&lt;/p&gt;

&lt;p&gt;For the document processing use case, this is fine - the Lambda reads and writes through the mount, so consistency is maintained within the NFS view. But if you have a downstream process polling S3 directly for new objects, it will see a delay.&lt;/p&gt;

&lt;h3&gt;
  
  
  VPC Cold Starts
&lt;/h3&gt;

&lt;p&gt;Putting Lambda in a VPC adds cold start latency. AWS has improved this significantly with Hyperplane ENI caching, but in this benchmark I observed roughly 1-2 seconds of additional cold start time compared to the non-VPC Lambda. For infrequently-invoked functions, this matters. For functions that process batches (like our document processor), the cold start is amortized across many files.&lt;/p&gt;

&lt;h3&gt;
  
  
  50 Million Object Limit
&lt;/h3&gt;

&lt;p&gt;Each mounted file system supports up to 50 million objects. For most workloads this is generous, but if you're mounting a bucket with hundreds of millions of small objects, you'll need to scope the mount to a prefix. In Terraform, this is a creation-time argument on &lt;code&gt;aws_s3files_file_system&lt;/code&gt; (not shown in the demo, which mounts the entire bucket). Via the CLI, use the &lt;code&gt;--prefix&lt;/code&gt; flag on &lt;code&gt;create-file-system&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Name Restrictions
&lt;/h3&gt;

&lt;p&gt;S3 allows object keys that don't map cleanly to POSIX filenames. According to AWS documentation, keys with trailing slashes, path traversal patterns (&lt;code&gt;../&lt;/code&gt;), or components longer than 255 characters will not appear in the file-system view. The objects remain accessible via the S3 API, but the file system won't show them. AWS recommends monitoring the CloudWatch &lt;code&gt;ImportFailures&lt;/code&gt; metric to detect these cases, as there are no client-side errors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Delete and Update Propagation
&lt;/h3&gt;

&lt;p&gt;S3-side changes only propagate to the NFS mount for files whose data is currently in the high-performance storage (the "hot" cache). In testing, hot-file deletes via the S3 API remained readable on the mount for roughly 6-18 seconds before disappearing. Modifications followed the same pattern: the mount saw the stale version until the EventBridge notification arrived.&lt;/p&gt;

&lt;p&gt;For files whose data has been expired out of the cache (cold files), S3-side changes don't propagate at all until the next NFS read, at which point S3 Files fetches the latest version from S3. So the 6-18 second range observed above is a hot-path number; cold-path updates are lazy and unbounded. If you're designing a pipeline that writes via the S3 API and reads via the mount, test both cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Access Point Ownership
&lt;/h3&gt;

&lt;p&gt;This is the biggest surprise I hit, and it drove a design change in the demo.&lt;/p&gt;

&lt;p&gt;Objects written through the NFS mount &lt;em&gt;do&lt;/em&gt; carry POSIX ownership metadata - S3 Files stores it as user-defined S3 object metadata (&lt;code&gt;file-permissions&lt;/code&gt;, &lt;code&gt;file-owner&lt;/code&gt;, &lt;code&gt;file-group&lt;/code&gt;, &lt;code&gt;file-mtime&lt;/code&gt;) on every object it writes. But objects written via the S3 API - &lt;code&gt;s3.put_object()&lt;/code&gt;, &lt;code&gt;aws s3 cp&lt;/code&gt;, the &lt;code&gt;before&lt;/code&gt; Lambda's boto3 calls - don't have that metadata. When S3 Files imports those API-written objects into the NFS view, they get default permissions: &lt;code&gt;root:root&lt;/code&gt; (UID 0, GID 0) with mode &lt;code&gt;0644&lt;/code&gt; for files and &lt;code&gt;0755&lt;/code&gt; for directories. That asymmetry is the mechanism behind this issue: directories are traversable and readable by everyone (which is why the inbox reads worked), but only writable by root (which is why creating entries in &lt;code&gt;processed/&lt;/code&gt; failed). Those directories, incidentally, are just S3 prefixes materialized as zero-byte objects - which is why S3-API writes can create them as a side effect of &lt;code&gt;PutObject&lt;/code&gt; and why they end up root-owned when imported.&lt;/p&gt;

&lt;p&gt;The first time my &lt;code&gt;after&lt;/code&gt; Lambda ran with &lt;code&gt;posix_user { uid = 1000, gid = 1000 }&lt;/code&gt;, it failed with &lt;code&gt;PermissionError: [Errno 13] Permission denied: '/mnt/docs/processed/...&lt;/code&gt;. The Lambda could read the inbox just fine, but it couldn't &lt;em&gt;create&lt;/em&gt; anything under &lt;code&gt;/mnt/docs/processed/&lt;/code&gt; because S3 Files had reflected a previous &lt;code&gt;before&lt;/code&gt;-Lambda &lt;code&gt;PutObject&lt;/code&gt; into NFS as a root-owned directory.&lt;/p&gt;

&lt;p&gt;Four ways out, ordered from best (least privilege) to most expedient:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use a scoped access point path&lt;/strong&gt; (recommended for production). Set &lt;code&gt;root_directory.path = "/lambda-workspace"&lt;/code&gt; with &lt;code&gt;creation_permissions { owner_uid = 1000, owner_gid = 1000, permissions = "755" }&lt;/code&gt; and &lt;code&gt;posix_user { uid = 1000, gid = 1000 }&lt;/code&gt;. S3 Files creates that path owned by your UID, and the Lambda only sees its owned subtree. The tradeoff: every S3 object the Lambda needs to see must be keyed under &lt;code&gt;lambda-workspace/...&lt;/code&gt;, and a raw &lt;code&gt;aws s3 cp&lt;/code&gt; into any other prefix is invisible to the mount. This enforces least privilege at the access-point level.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grant &lt;code&gt;s3files:ClientRootAccess&lt;/code&gt;&lt;/strong&gt; on the Lambda's IAM role. This lets a non-root UID (still &lt;code&gt;posix_user { uid = 1000 }&lt;/code&gt;) perform operations against root-owned entries - including creating files inside root-owned directories imported from S3 - without running the entire Lambda as UID 0. It's the middle ground: keep least-privilege POSIX identity, elevate only for cross-boundary operations with S3-origin content. This permission is included in the &lt;code&gt;AmazonS3FilesClientFullAccess&lt;/code&gt; managed policy, which is probably why I missed it - the demo's inline policy has only &lt;code&gt;ClientMount&lt;/code&gt; + &lt;code&gt;ClientWrite&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid path collisions&lt;/strong&gt;: have each S3-API-side producer write to a prefix the NFS client never writes into. The demo does this - the &lt;code&gt;before&lt;/code&gt; Lambda writes to &lt;code&gt;processed-before/&lt;/code&gt; and the &lt;code&gt;after&lt;/code&gt; Lambda writes to &lt;code&gt;processed-after/&lt;/code&gt; - so their outputs never fight over directory ownership.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run as root&lt;/strong&gt; (access point &lt;code&gt;posix_user { uid = 0, gid = 0 }&lt;/code&gt;). The Lambda runs as "root" for NFS purposes and can write alongside S3-born files. This is what the demo uses because the side-by-side comparison needs both approaches to see the same bucket root. &lt;strong&gt;This is the opposite of least privilege&lt;/strong&gt; - any NFS client can read, write, and delete anything on the mount. Last resort only.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're using S3 Files to replace an existing boto3 pipeline, plan this up front. Any prefix your NFS clients will &lt;em&gt;write into&lt;/em&gt; should be created from the mount side first, or left entirely unpopulated from the S3 side. Anything written via &lt;code&gt;PutObject&lt;/code&gt; will arrive in NFS as root-owned and block writes from non-root access points (unless you've granted &lt;code&gt;ClientRootAccess&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Note: the demo pairs option 4 with an &lt;code&gt;aws_s3files_file_system_policy&lt;/code&gt; that restricts which IAM principals can mount at all (deny-by-default, allow only the Lambda and EC2 benchmark roles, enforce TLS). If you use uid=0, this resource-based policy is your primary access control.&lt;/p&gt;

&lt;p&gt;Related: don't pre-create "directory" marker objects (zero-byte &lt;code&gt;inbox/&lt;/code&gt;, &lt;code&gt;processed/&lt;/code&gt;, etc.) from Terraform. I had three &lt;code&gt;aws_s3_object&lt;/code&gt; resources doing this and they turned out to be the exact cause of the ownership collision. The Lambda's &lt;code&gt;os.makedirs(exist_ok=True)&lt;/code&gt; creates the directories over NFS with the correct access-point ownership - let it do its job.&lt;/p&gt;

&lt;h3&gt;
  
  
  S3-to-NFS Propagation Delay
&lt;/h3&gt;

&lt;p&gt;Also worth knowing: writes go in both directions, but they don't propagate symmetrically. NFS writes commit to S3 on the 60-second schedule described above. S3 writes appear in the NFS view asynchronously via EventBridge notifications, which typically takes a few seconds but can take longer under load. If your benchmark seeds files via &lt;code&gt;s3.put_object()&lt;/code&gt; and immediately invokes an NFS-mounted Lambda, the mount will see an empty inbox. The benchmark script in this project waits 60 seconds after S3-seeding to sidestep this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conflict resolution:&lt;/strong&gt; if the same file is modified through both the NFS mount and the S3 API before synchronization completes, the S3 bucket version wins. The file-system copy is not silently overwritten - it gets moved to a &lt;code&gt;.s3files-lost+found-&amp;lt;file-system-id&amp;gt;&lt;/code&gt; directory on the mount. Files in lost+found are not copied back to the S3 bucket and persist indefinitely on the file system, counting toward storage costs until explicitly deleted. This is important to understand for mixed API + file-system workflows: the S3 side is always authoritative, and your NFS edits may end up in lost+found if there's a race.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to Use S3 Files
&lt;/h2&gt;

&lt;p&gt;S3 Files isn't always the right choice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read-only analytics at lowest cost&lt;/strong&gt;: Mountpoint for S3 adds zero software cost and is optimized for sequential reads of large files. If you're running Spark, Presto, or ML training jobs that only read data, Mountpoint is cheaper and simpler (no VPC required).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-AWS or S3-compatible storage&lt;/strong&gt;: s3fs-fuse works with MinIO, Ceph, and other S3-compatible object stores. S3 Files is AWS-only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Existing EFS mounts&lt;/strong&gt;: Lambda supports one file-system mount - EFS or S3 Files, not both. For any new build where the backing data lives in S3, prefer S3 Files over EFS (you skip the EFS-to-S3 sync problem entirely). Only stick with EFS if the function needs a shared writable file system that multiple Lambda invocations coordinate through simultaneously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency-critical writes that must appear in S3 immediately&lt;/strong&gt;: The 60-second commit delay means writes aren't visible to S3 API consumers right away. If you need sub-second S3 visibility, stick with direct S3 API calls.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;S3 Files eliminates an entire category of boilerplate from AWS applications. The download-process-upload pattern that we've all written hundreds of times is no longer necessary. Your code just reads and writes files. The underlying storage happens to be S3.&lt;/p&gt;

&lt;p&gt;The Terraform story is solid from day one - native provider resources shipped in v6.40.0, just one day after S3 Files went GA. Three resources (&lt;code&gt;aws_s3files_file_system&lt;/code&gt;, &lt;code&gt;aws_s3files_mount_target&lt;/code&gt;, &lt;code&gt;aws_s3files_access_point&lt;/code&gt;) cover the full setup, and the &lt;code&gt;file_system_config&lt;/code&gt; block on &lt;code&gt;aws_lambda_function&lt;/code&gt; works identically to the existing EFS mount pattern.&lt;/p&gt;

&lt;p&gt;All the code for this post - Terraform modules, Lambda handlers (with Powertools), the EC2 runner, benchmark scripts, and the Makefile - is available in the &lt;a href="https://github.com/RDarrylR/s3-files-initial" rel="noopener noreferrer"&gt;companion repository&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Summary
&lt;/h3&gt;

&lt;p&gt;For a demo deployment: approximately $75/month if you leave everything running. The EC2 instance (about $53/month) and SSM VPC interface endpoints (about $21/month, needed because the EC2 is in a private subnet with no NAT) are the bulk. Lambda and S3 costs are negligible. Stop the EC2 and run &lt;code&gt;make destroy&lt;/code&gt; when done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost note for production:&lt;/strong&gt; S3 Files meters data reads and writes with a &lt;strong&gt;minimum of 32 KiB per operation&lt;/strong&gt;, regardless of actual size. This benchmark's medium text files (500-2000 words) are above that threshold, so it didn't show up. But at scale with many tiny files - say 10,000 sub-1 KiB JSON configs read once each - you'd pay for 10,000 x 32 KiB = 320 MiB of reads, not 10 MiB. For small-file-heavy workloads, factor this into your cost model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;S3 Files provides file-system semantics on S3 via NFS v4.1/4.2 - read, write, rename, advisory file locking&lt;/li&gt;
&lt;li&gt;Lambda functions with &amp;gt;=512MB memory get direct S3 read bypass for reads &amp;gt;=1 MiB (up to 3 GB/s ceiling) - but only if the execution role has &lt;code&gt;s3:GetObject&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;No NAT Gateway needed - use S3 Gateway VPC endpoints (but add SSM interface endpoints for EC2)&lt;/li&gt;
&lt;li&gt;Mountpoint can't run on Lambda (no FUSE) - S3 Files is the only file-system option for serverless&lt;/li&gt;
&lt;li&gt;For large sequential reads, Mountpoint still wins (459 MiB/s vs 161 MiB/s) - it was purpose-built for throughput&lt;/li&gt;
&lt;li&gt;For directory operations, Mountpoint is prohibitively slow (176s vs 0.9s for 10K entries) - use S3 Files&lt;/li&gt;
&lt;li&gt;The 60-second commit delay (NFS to S3) and the async EventBridge propagation (S3 to NFS) are the two consistency boundaries you have to design around&lt;/li&gt;
&lt;li&gt;Access point ownership interacts with S3-origin objects in ways that will surprise you - plan prefix ownership up front&lt;/li&gt;
&lt;li&gt;The trust policy service principal is &lt;code&gt;elasticfilesystem.amazonaws.com&lt;/code&gt;, not &lt;code&gt;s3files.amazonaws.com&lt;/code&gt; - S3 Files is built on EFS&lt;/li&gt;
&lt;li&gt;Native Terraform support shipped day one in AWS provider v6.40.0, but use a &lt;code&gt;time_sleep&lt;/code&gt; between mount targets and Lambda to avoid lifecycle state races&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/aws/launching-s3-files-making-s3-buckets-accessible-as-file-systems/" rel="noopener noreferrer"&gt;Launching S3 Files, Making S3 Buckets Accessible as File Systems&lt;/a&gt; - AWS News Blog announcement&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files.html" rel="noopener noreferrer"&gt;Amazon S3 Files Documentation&lt;/a&gt; - Official user guide&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-filesystem-s3files.html" rel="noopener noreferrer"&gt;Configuring S3 Files Access for Lambda&lt;/a&gt; - Lambda-specific setup guide&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-getting-started.html" rel="noopener noreferrer"&gt;S3 Files Getting Started Tutorial&lt;/a&gt; - Step-by-step walkthrough&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3files_file_system" rel="noopener noreferrer"&gt;Terraform aws_s3files_file_system Resource&lt;/a&gt; - Terraform provider docs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3files_mount_target" rel="noopener noreferrer"&gt;Terraform aws_s3files_mount_target Resource&lt;/a&gt; - Mount target configuration&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3files_access_point" rel="noopener noreferrer"&gt;Terraform aws_s3files_access_point Resource&lt;/a&gt; - Access point with POSIX user mapping&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.theregister.com/2026/04/09/aws_s3_files_stress_test_corey_quinn/" rel="noopener noreferrer"&gt;AWS S3 Files Stress Test&lt;/a&gt; - The Register's independent stress test with edge case findings&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://computingforgeeks.com/s3-files-vs-mountpoint-vs-s3fs/" rel="noopener noreferrer"&gt;S3 Files vs Mountpoint vs s3fs-fuse Comparison&lt;/a&gt; - Detailed feature and performance comparison&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/aws-builders/architecture-layers-that-s3-files-eliminates-and-creates-16ke"&gt;Architecture Layers That S3 Files Eliminates&lt;/a&gt; - Architectural patterns analysis&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/awslabs/mountpoint-s3" rel="noopener noreferrer"&gt;Mountpoint for Amazon S3&lt;/a&gt; - The read-heavy alternative for analytics workloads&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/lambda-managed-instances-with-terraform-multi-concurrency-high-memory-and-compute-options" rel="noopener noreferrer"&gt;Lambda Managed Instances with Terraform&lt;/a&gt; - High-memory Lambda patterns that complement S3 Files&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/powertools-for-aws-lambda-best-practices-by-default" rel="noopener noreferrer"&gt;Powertools for AWS Lambda - Best Practices By Default&lt;/a&gt; - Observability patterns for Lambda functions&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Connect with me on&lt;/em&gt; &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://bsky.app/profile/darrylruggles.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://github.com/RDarrylR" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;&lt;em&gt;, or the&lt;/em&gt; &lt;a href="https://community.aws/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;&lt;em&gt;. Check out more of my projects at&lt;/em&gt; &lt;a href="https://darryl-ruggles.cloud" rel="noopener noreferrer"&gt;darryl-ruggles.cloud&lt;/a&gt; &lt;em&gt;and join the&lt;/em&gt; &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless&lt;/a&gt; &lt;em&gt;community.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>s3</category>
      <category>serverless</category>
      <category>s3files</category>
    </item>
    <item>
      <title>LLM on EKS: Serving with vLLM</title>
      <dc:creator>Daniel Pepuho</dc:creator>
      <pubDate>Fri, 01 May 2026 14:49:45 +0000</pubDate>
      <link>https://forem.com/aws-builders/llm-on-eks-serving-with-vllm-2khg</link>
      <guid>https://forem.com/aws-builders/llm-on-eks-serving-with-vllm-2khg</guid>
      <description>&lt;p&gt;Last year, I mentioned that I'm interested in learning how to serve LLMs in production. At first it was just curiosity, but over time I wanted to actually try building something—not just reading about it.&lt;/p&gt;

&lt;p&gt;This post is a small step in that direction: serving an LLM using &lt;a href="https://github.com/vllm-project/vllm" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt;, deployed on &lt;a href="https://aws.amazon.com/eks" rel="noopener noreferrer"&gt;Amazon EKS&lt;/a&gt;, provisioned the infra using &lt;a href="https://github.com/aws/aws-cdk" rel="noopener noreferrer"&gt;AWS CDK&lt;/a&gt;, and wrapped into a simple chatbot using &lt;a href="https://github.com/streamlit/streamlit" rel="noopener noreferrer"&gt;Streamlit&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Exploring LLM serving on a Kubernetes cluster (EKS)&lt;/li&gt;
&lt;li&gt;Using vLLM as the inference engine&lt;/li&gt;
&lt;li&gt;Provisioning the infrastructure with AWS CDK (IaC)&lt;/li&gt;
&lt;li&gt;Building a simple chatbot to interact with the LLM using Streamlit&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What We Tryna Build
&lt;/h2&gt;

&lt;p&gt;The idea is simple: build a small chatbot powered by an LLM and run the model on Kubernetes.&lt;/p&gt;

&lt;p&gt;I'm not focusing on training models here. I just want to understand how to serve an LLM properly.&lt;/p&gt;

&lt;p&gt;The flow looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User interacts with a chatbot (running locally)&lt;/li&gt;
&lt;li&gt;The chatbot sends a request to a vLLM API&lt;/li&gt;
&lt;li&gt;The model processes the request and returns a response&lt;/li&gt;
&lt;li&gt;The vLLM service runs on Amazon EKS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9neqtclm9ib8piid2i74.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9neqtclm9ib8piid2i74.webp" alt="Project Architecture" width="800" height="387"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before we dive in, you'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS Account &amp;amp; IAM&lt;/strong&gt;: An AWS Account ID and an IAM User with Administrator access (IAM to manage EKS). We'll need the IAM username to map &lt;code&gt;kubectl&lt;/code&gt; (admin) permissions to the EKS cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS CLI&lt;/strong&gt; installed and configured (&lt;code&gt;aws configure&lt;/code&gt;) using your IAM user credentials.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS CDK&lt;/strong&gt; installed (&lt;code&gt;npm install -g aws-cdk&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;AWS usually limits new accounts to 0 vCPUs for "Running On-Demand G and VT instances". You'll need to go to the AWS Service Quotas console and request an increase to at least 4 vCPUs to run the &lt;code&gt;g4dn.xlarge&lt;/code&gt; node.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;vLLM&lt;/strong&gt; — inference engine for the LLM. Fast, supports streaming, and exposes an OpenAI-compatible API out of the box.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon EKS&lt;/strong&gt; — The Kubernetes service on AWS to run the vLLM workload.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS CDK&lt;/strong&gt; — infrastructure as code to manage AWS infra, at this time I'll using Python. One &lt;code&gt;cdk deploy&lt;/code&gt; and everything is provisioned.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streamlit&lt;/strong&gt; — simple chatbot UI that talks to the vLLM endpoint.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why vLLM?
&lt;/h2&gt;

&lt;p&gt;There are a few ways to serve an LLM — you could use TGI, Triton, or just raw HuggingFace &lt;code&gt;transformers&lt;/code&gt;. I went with vLLM for a few reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PagedAttention&lt;/strong&gt; — manages GPU memory more efficiently, which matters a lot on a single &lt;code&gt;g4dn.xlarge&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI-compatible API&lt;/strong&gt; — the chatbot can use the &lt;code&gt;openai&lt;/code&gt; Python SDK without any changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming support&lt;/strong&gt; — responses stream token by token, which makes the chatbot feel more responsive&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why EKS?
&lt;/h2&gt;

&lt;p&gt;I could've just spun up an EC2 instance and SSH'd in. But that's not really building reliable infrastructure — that's just running a script on a server.&lt;/p&gt;

&lt;p&gt;EKS gives us a proper environment to run GPU workloads: node groups, taints and tolerations to make sure only the vLLM pod lands on the GPU node, and a LoadBalancer service to expose the endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Environment Setup
&lt;/h2&gt;

&lt;p&gt;Before getting into the code, let's set up a &lt;code&gt;.env&lt;/code&gt; file at the root of the project. We'll use this to manage our AWS configurations so we don't hardcode them into the repository.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# AWS Config&lt;/span&gt;
&lt;span class="nv"&gt;AWS_DEFAULT_ACCOUNT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;123456789012
&lt;span class="nv"&gt;AWS_DEFAULT_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-east-1
&lt;span class="nv"&gt;AWS_ADMIN_USER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_aws_username
&lt;span class="nv"&gt;AWS_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;eks-llm-model-bucket

&lt;span class="c"&gt;# EKS Config&lt;/span&gt;
&lt;span class="nv"&gt;CLUSTER_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;eks-llm

&lt;span class="c"&gt;# VLLM Config&lt;/span&gt;
&lt;span class="c"&gt;# VLLM_URL will be added later after the deployment is live&lt;/span&gt;
&lt;span class="c"&gt;# VLLM_URL=http://&amp;lt;nlb-endpoint&amp;gt;.elb.us-east-1.amazonaws.com&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Code
&lt;/h2&gt;

&lt;h3&gt;
  
  
  EKS Stack
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;EksStack&lt;/code&gt; provisions everything at the infrastructure level: VPC, EKS cluster, node groups, and an S3 bucket for model storage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;vpc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Vpc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EksVpc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_azs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;eks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Cluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EksCluster&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;eks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;KubernetesVersion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;V1_34&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;vpc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;default_capacity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;kubectl_layer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;kubectl_layer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;default_capacity=0&lt;/code&gt; means no default node group — we define our own below.&lt;/p&gt;

&lt;p&gt;We have two node groups:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 1. CPU, runs system pods (CoreDNS, kube-proxy, etc.)
&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_nodegroup_capacity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ManagedNodeGroup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;desired_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;min_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instance_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;InstanceType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;t3.medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
    &lt;span class="n"&gt;ami_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;eks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NodegroupAmiType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AL2023_X86_64_STANDARD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. GPU, for running vLLM
&lt;/span&gt;&lt;span class="n"&gt;gpu_node_role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GpuNodeRole&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;assumed_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ServicePrincipal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ec2.amazonaws.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;managed_policies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ManagedPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_aws_managed_policy_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AmazonEKSWorkerNodePolicy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ManagedPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_aws_managed_policy_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AmazonEC2ContainerRegistryReadOnly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ManagedPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_aws_managed_policy_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AmazonEKS_CNI_Policy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_nodegroup_capacity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GpuNodeGroup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;desired_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;min_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;disk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instance_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;InstanceType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;g4dn.xlarge&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
    &lt;span class="n"&gt;node_role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;gpu_node_role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ami_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;eks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NodegroupAmiType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AL2023_X86_64_NVIDIA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;taints&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;eks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TaintSpec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;effect&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;eks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TaintEffect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NO_SCHEDULE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aws_auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_user_mapping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_user_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AdminUser&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS_ADMIN_USER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system:masters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Allow GPU nodes to read from the model bucket
&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grant_read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gpu_node_role&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;disk_size=100&lt;/code&gt; ensures we don't get pod eviction issues, as the default 20GB is too small for the vLLM container image and the model cache. The taint &lt;code&gt;nvidia.com/gpu=true:NoSchedule&lt;/code&gt; on the GPU node group means no pod will be scheduled there unless it explicitly tolerates it. This keeps system pods off the GPU node.&lt;/p&gt;

&lt;p&gt;The S3 bucket is for model weights, and the GPU node role gets read access to it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# S3 bucket for model weights
&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ModelBucket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bucket_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;removal_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;RemovalPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RETAIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;block_public_access&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BlockPublicAccess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BLOCK_ALL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will create those instances:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Node&lt;/th&gt;
&lt;th&gt;vCPU&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;t3.medium&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4Gi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;g4dn.xlarge&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;16Gi&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  vLLM Stack
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;VllmStack&lt;/code&gt; takes the cluster from &lt;code&gt;EksStack&lt;/code&gt; and deploys vLLM on top of it.&lt;/p&gt;

&lt;p&gt;First, we install the NVIDIA device plugin via Helm. This is what makes EKS aware of the GPU on the node — without it, you can't request &lt;code&gt;nvidia.com/gpu&lt;/code&gt; as a resource in your pod spec.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Our LLM
&lt;/span&gt;
&lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_helm_chart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NvidiaDevicePlugin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia-device-plugin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;repository&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://nvidia.github.io/k8s-device-plugin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kube-system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nodeSelector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tolerations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;operator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Exists&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effect&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NoSchedule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the toleration on the plugin itself — it needs to run on the GPU node to expose the GPU, so it has to tolerate the taint we set earlier.&lt;/p&gt;

&lt;p&gt;Then the vLLM Deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_manifest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VllmDeployment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;apiVersion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;apps/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kind&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Deployment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vllm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;namespace&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;replicas&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;matchLabels&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vllm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;template&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;labels&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vllm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tolerations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;operator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Exists&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effect&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NoSchedule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nodeSelector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;containers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vllm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vllm/vllm-openai:latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--download-dir&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/model-cache&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--dtype&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;half&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--quantization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;awq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--max-model-len&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4096&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="p"&gt;],&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;env&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS_DEFAULT_REGION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MODEL_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model_bucket_name&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VLLM_PORT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                            &lt;span class="p"&gt;],&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ports&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;containerPort&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;requests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;12Gi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                            &lt;span class="p"&gt;},&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;volumeMounts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model-cache&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mountPath&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/model-cache&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;readinessProbe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;httpGet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/health&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;periodSeconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="p"&gt;}],&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;volumes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model-cache&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;emptyDir&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}}],&lt;/span&gt;
                    &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth noting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;nodeSelector: workload=gpu&lt;/code&gt; pins the pod to the GPU node group&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;nvidia.com/gpu: 1&lt;/code&gt; requests exactly one GPU&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dtype: half&lt;/code&gt; and &lt;code&gt;quantization: awq&lt;/code&gt; drops the model size to ~5.7GB so it comfortably fits in the 16GB VRAM of &lt;code&gt;g4dn.xlarge&lt;/code&gt; without OOM&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;max-model-len: 4096&lt;/code&gt; caps the context window to avoid OOM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finally, a LoadBalancer service to expose the endpoint publicly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="n"&gt;cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_manifest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VllmService&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;apiVersion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kind&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vllm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;namespace&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;annotations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;service.beta.kubernetes.io/aws-load-balancer-type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nlb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LoadBalancer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;selector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vllm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ports&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;port&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;targetPort&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;protocol&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TCP&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Internal cluster URL for the vLLM service
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vllm_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VLLM_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://vllm.default.svc.cluster.local:80&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nc"&gt;CfnOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VllmUrl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vllm_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Internal vLLM service URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deploy
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cdk bootstrap   &lt;span class="c"&gt;# first time only&lt;/span&gt;
cdk deploy &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the deployment is success you'll see the node on this cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get nodes
NAME                          STATUS   ROLES    AGE     VERSION
ip-10-0-xx-yy.ec2.internal   Ready    &amp;lt;none&amp;gt;   9m18s   v1.34.7-eks-40737a8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl get nodes &lt;span class="nt"&gt;--show-labels&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;gpu

ip-10-0-xx-yy.ec2.internal   Ready    &amp;lt;none&amp;gt;   4m23s   v1.34.7-eks-40737a8   beta.kubernetes.io/arch&lt;span class="o"&gt;=&lt;/span&gt;amd64,
...
&lt;span class="nv"&gt;workload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gpu
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait for the vLLM pod to be ready (~5-10 minutes, model is downloaded from HuggingFace on first start):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods &lt;span class="nt"&gt;-w&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;NAME                   READY   STATUS    RESTARTS   AGE
vllm-64c858884-pz4gz   0/1     Running   0          2m24s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl logs &lt;span class="nt"&gt;-f&lt;/span&gt; deployment/vllm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;WARNING 05-01 12:48:00 &lt;span class="o"&gt;[&lt;/span&gt;argparse_utils.py:257] With &lt;span class="sb"&gt;`&lt;/span&gt;vllm serve&lt;span class="sb"&gt;`&lt;/span&gt;, you should provide the model as a positional argument or &lt;span class="k"&gt;in &lt;/span&gt;a config file instead of via the &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--model&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt; o
ption. The &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nt"&gt;--model&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt; option will be removed &lt;span class="k"&gt;in &lt;/span&gt;a future version.
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO 05-01 12:48:00 &lt;span class="o"&gt;[&lt;/span&gt;utils.py:299]
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO 05-01 12:48:00 &lt;span class="o"&gt;[&lt;/span&gt;utils.py:299]        █     █     █▄   ▄█
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO 05-01 12:48:00 &lt;span class="o"&gt;[&lt;/span&gt;utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.20.0
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO 05-01 12:48:00 &lt;span class="o"&gt;[&lt;/span&gt;utils.py:299]   █▄█▀ █     █     █     █  model   hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO 05-01 12:48:00 &lt;span class="o"&gt;[&lt;/span&gt;utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO 05-01 12:48:00 &lt;span class="o"&gt;[&lt;/span&gt;utils.py:299]
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO 05-01 12:48:00 &lt;span class="o"&gt;[&lt;/span&gt;utils.py:233] non-default args: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;'model_tag'&lt;/span&gt;: &lt;span class="s1"&gt;'hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4'&lt;/span&gt;, &lt;span class="s1"&gt;'model'&lt;/span&gt;: &lt;span class="s1"&gt;'hugging-quants/Meta-L
lama-3.1-8B-Instruct-AWQ-INT4'&lt;/span&gt;, &lt;span class="s1"&gt;'dtype'&lt;/span&gt;: &lt;span class="s1"&gt;'half'&lt;/span&gt;, &lt;span class="s1"&gt;'max_model_len'&lt;/span&gt;: 4096, &lt;span class="s1"&gt;'quantization'&lt;/span&gt;: &lt;span class="s1"&gt;'awq'&lt;/span&gt;, &lt;span class="s1"&gt;'download_dir'&lt;/span&gt;: &lt;span class="s1"&gt;'/model-cache'&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Inference
&lt;/h2&gt;

&lt;p&gt;Once the pod is running, grab the NLB endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get svc vllm &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.loadBalancer.ingress[0].hostname}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, check that the model is loaded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://&amp;lt;nlb-endpoint&amp;gt;/v1/models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"list"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"owned_by"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vllm"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, send it a prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://&amp;lt;nlb-endpoint&amp;gt;/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
    "messages": [
      {"role": "system", "content": "You are a helpful AI assistant."},
      {"role": "user", "content": "What is CAP theorem?"}
    ],
    "max_tokens": 150
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chatcmpl-8609921b347e2718"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat.completion"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"created"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1777640350&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The CAP theorem, also known as the Brewer's CAP theorem, is a fundamental concept in distributed systems. It was first proposed by Eric Brewer in 2000.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;**CAP stands for:**&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;1. **Consistency**: This refers to the ability of a system to ensure that all nodes in the system have the same view of the data. In other words, all nodes see the same data, and any updates are reflected uniformly across the system.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;2. **Availability**: This refers to the ability of a system to ensure that every request receives a (non-error) response, without guarantee that it contains the most recent version of the information. In other words, the system is always available, even if some nodes are down or"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The vLLM logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO:     10.0.253.68:42022 - &lt;span class="s2"&gt;"GET /health HTTP/1.1"&lt;/span&gt; 200 OK
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO 05-01 12:59:19 &lt;span class="o"&gt;[&lt;/span&gt;loggers.py:271] Engine 000: Avg prompt throughput: 1.4 tokens/s, Avg generation throughput: 4.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.2%, Prefix cache hit rate: 28.8%
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO:     10.0.253.68:33400 - &lt;span class="s2"&gt;"GET /health HTTP/1.1"&lt;/span&gt; 200 OK
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO 05-01 12:59:29 &lt;span class="o"&gt;[&lt;/span&gt;loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 4.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.3%, Prefix cache hit rate: 28.8%
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO 05-01 12:59:39 &lt;span class="o"&gt;[&lt;/span&gt;loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 4.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.4%, Prefix cache hit rate: 28.8%
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO:     10.0.253.68:34814 - &lt;span class="s2"&gt;"GET /health HTTP/1.1"&lt;/span&gt; 200 OK
&lt;span class="o"&gt;(&lt;/span&gt;APIServer &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1&lt;span class="o"&gt;)&lt;/span&gt; INFO:     10.0.228.56:20496 - &lt;span class="s2"&gt;"POST /v1/chat/completions HTTP/1.1"&lt;/span&gt; 200 OK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, if you get a response back, congrats the model is live. 🎉&lt;/p&gt;

&lt;p&gt;Working with API endpoint is great, but typing &lt;code&gt;curl&lt;/code&gt; commands is not exactly a great user experience. Let's build a chatbot UI on top of this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88mldyuew6wogp3adskf.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F88mldyuew6wogp3adskf.gif" alt="That's not enough" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Chatbot with Streamlit
&lt;/h2&gt;

&lt;p&gt;So let's built a simple chatbot using &lt;a href="https://github.com/streamlit/streamlit" rel="noopener noreferrer"&gt;Streamlit&lt;/a&gt; that talks directly to the vLLM.&lt;/p&gt;

&lt;p&gt;The nice part? Since vLLM exposes an OpenAI-compatible API, we can just use the openai Python SDK without any efforts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;Install the dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;streamlit openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's create a simple UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;src
&lt;span class="nb"&gt;touch &lt;/span&gt;src/app.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;streamlit&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Using the URL from AWS load balancer
&lt;/span&gt;&lt;span class="n"&gt;VLLM_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VLLM_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://xx-yy.elb.us-east-1.amazonaws.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;VLLM_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_page_config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Llama 3 Chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page_icon&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🦙&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🦙 Llama 3 Chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;caption&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Powered by vLLM on EKS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat_input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How is you day? Say something...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;streamlit run app.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  You can now view your Streamlit app &lt;span class="k"&gt;in &lt;/span&gt;your browser.

  Local URL: http://localhost:8501
  Network URL: http://ww.xx.yy.zz:8501
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open the URL in your browser, and you should see a simple chatbot interface. Type in a message, and watch the response stream back token by token.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ask1uuoeb4ph27o1zcb.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ask1uuoeb4ph27o1zcb.webp" alt="Chatbot response 1" width="800" height="692"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64g7uy5w06vavn3fezfl.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64g7uy5w06vavn3fezfl.webp" alt="Chatbot response 2" width="800" height="592"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Serving an LLM is a bit different than deploying a typical web app. Memory constraints are real—we had to use an AWQ quantized model just to make it fit inside a single &lt;code&gt;g4dn.xlarge&lt;/code&gt; instance without hitting OOM. But combining vLLM for inference and AWS CDK to spin up the EKS infrastructure makes the whole setup pretty straightforward.&lt;/p&gt;

&lt;p&gt;Don't forget to run &lt;code&gt;cdk destroy --all&lt;/code&gt; when you're done! Leaving an EKS cluster and a &lt;code&gt;g4dn.xlarge&lt;/code&gt; node running 24/7 will result in a very hefty AWS bill.&lt;/p&gt;

&lt;p&gt;Aight. Thanks for reading this post, hope you found something useful 🚀&lt;/p&gt;

&lt;p&gt;References:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/serving-llms-using-vllm-and-amazon-ec2-instances-with-aws-ai-chips" rel="noopener noreferrer"&gt;AWS Blog: Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/deploy-llms-on-amazon-eks-using-vllm-deep-learning-containers" rel="noopener noreferrer"&gt;AWS Blog: Deploy LLMs on Amazon EKS using vLLM Deep Learning Containers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>kubernetes</category>
      <category>llm</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Enterprise AWS CDK: Architecting a Secure and Scalable Serverless API</title>
      <dc:creator>Dickson</dc:creator>
      <pubDate>Fri, 01 May 2026 07:14:37 +0000</pubDate>
      <link>https://forem.com/aws-builders/enterprise-aws-cdk-architecting-a-secure-and-scalable-serverless-api-8mf</link>
      <guid>https://forem.com/aws-builders/enterprise-aws-cdk-architecting-a-secure-and-scalable-serverless-api-8mf</guid>
      <description>&lt;p&gt;If you have spent any time deploying resources in AWS, you know that clicking through the AWS Management Console is fine for experimenting, but terrible for repeatable, production-grade systems. Historically, the answer to this was AWS CloudFormation — writing extensive JSON or YAML templates to declare your infrastructure. While CloudFormation is robust, writing thousands of lines of YAML isn't exactly a developer's dream.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiwdspapjsxu6ifew65l3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiwdspapjsxu6ifew65l3.png" alt="CDK"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The AWS Cloud Development Kit (CDK) is an open-source software development framework that lets you define your cloud application infrastructure using familiar programming languages like TypeScript, Python, Java, or C#. It acts as a powerful abstraction layer over CloudFormation. Instead of writing declarative YAML, you write imperative code to generate those templates. This means you get to use loops, conditionals, object-oriented principles, and your IDE's auto-completion to build your cloud architecture.&lt;/p&gt;

&lt;p&gt;However, deploying a simple API Gateway connected to a Lambda and a Database is easy in a CDK tutorial, but difficult to scale across an enterprise. In large organizations, we face strict compliance requirements, security reviews, and the need for high developer velocity. A single monolithic CDK stack simply won't survive contact with multiple engineering teams.&lt;/p&gt;

&lt;p&gt;In this article, we will walk through setting up a CDK project and explore the architectural decisions necessary to make a Serverless API (backed by Amazon Aurora PostgreSQL) secure, maintainable, and enterprise-ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started: Setting Up the CDK Project
&lt;/h2&gt;

&lt;p&gt;Before diving into enterprise patterns, let's look at how to initialize a fresh CDK project. You will need Node.js and the AWS CLI installed and configured.&lt;/p&gt;

&lt;p&gt;First, install the CDK toolkit globally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; aws-cdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, create a new directory for your project and initialize a TypeScript CDK app:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;backend
&lt;span class="nb"&gt;cd &lt;/span&gt;backend
cdk init app &lt;span class="nt"&gt;--language&lt;/span&gt; typescript
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, if this is your first time using CDK in your AWS account/region, you need to "bootstrap" it. This provisions the necessary S3 buckets and IAM roles CDK needs to deploy your apps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cdk bootstrap aws://ACCOUNT-NUMBER/REGION
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After initialization, your core project structure will look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;backend/
├── bin/
│   └── backend.ts          # The entry point of your CDK application
├── lib/
│   └── backend-stack.ts    # Where your infrastructure stack is defined
├── cdk.json                # Configuration file telling CDK how to execute your app
├── package.json
└── tsconfig.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While this structure is great for a starter project, we need to evolve it to support an enterprise architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture Overview
&lt;/h2&gt;

&lt;p&gt;We are building a typical modern backend:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Amazon API Gateway&lt;/strong&gt; (REST API) as the front door.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Lambda&lt;/strong&gt; functions (Node.js) to process business logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Aurora Serverless v2&lt;/strong&gt; (PostgreSQL) for resilient storage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon RDS Proxy&lt;/strong&gt; to manage database connections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Secrets Manager&lt;/strong&gt; to handle credentials securely.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let’s break down the CDK decisions that elevate this from a weekend project to an enterprise architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision 1: Modularity through L3 Domain Constructs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Enterprise Problem&lt;/strong&gt;: If network engineers, database administrators, and application developers all commit to the same &lt;code&gt;backend-stack.ts&lt;/code&gt; file, you will suffer from merge conflicts, accidental blast-radius damage, and slow deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Architectural Solution&lt;/strong&gt;: We must decouple our infrastructure into Level 3 (L3) Domain Constructs. Instead of one massive file, we define logical boundaries within a new &lt;code&gt;constructs&lt;/code&gt; folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;├── lib/
│   ├── backend-stack.ts    # Now acts only as the Orchestrator
│   └── constructs/         # Domain-Driven L3 Constructs
│       ├── api.ts          # API Gateway &amp;amp; Compute
│       ├── network.ts      # VPC &amp;amp; Routing
│       ├── secrets.ts      # Secrets Manager Integration
│       └── storage.ts      # Databases &amp;amp; Proxies
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We define explicit TypeScript contracts (Interfaces) to pass dependencies between these domains. The &lt;code&gt;Api&lt;/code&gt; construct doesn't need to know &lt;em&gt;how&lt;/em&gt; the database was built; it only needs the Proxy Endpoint and the Secret to connect.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lib/constructs/api.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ApiProps&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;IVpc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;databaseProxy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;rds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DatabaseProxy&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;databaseSecret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;secretsmanager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ISecret&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;authSecrets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;secretsmanager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ISecret&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Api&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Construct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ApiProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// ... API Logic ...&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This inversion of control allows the Platform team to update the database configuration without ever touching the API construct code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision 2: A Secure-by-Default Network Topology
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Enterprise Problem&lt;/strong&gt;: Security cannot be an afterthought. Leaving a database in a public subnet or manually managing security group IP addresses is a critical audit failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Architectural Solution&lt;/strong&gt;: We enforce a strict, 3-tier VPC architecture and utilize IAM for all internal authentication.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Isolated Storage&lt;/strong&gt;: The Aurora cluster is deployed exclusively into &lt;code&gt;SubnetType.PRIVATE_ISOLATED&lt;/code&gt;. It has absolutely no route to the internet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private Compute&lt;/strong&gt;: Lambda functions are deployed into &lt;code&gt;SubnetType.PRIVATE_WITH_EGRESS&lt;/code&gt; so they can reach external APIs if needed, but cannot be invoked directly from the internet (only via API Gateway).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connection Pooling &amp;amp; IAM Auth&lt;/strong&gt;: We deploy an RDS Proxy. Instead of lambdas opening direct connections to the database using hardcoded passwords, they connect to the Proxy.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We codify this security by granting access using CDK's principle of least privilege methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Inside the API Construct wiring&lt;/span&gt;

&lt;span class="c1"&gt;// 1. Allow Network Traffic from Lambda to Proxy&lt;/span&gt;
&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;databaseProxy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;connections&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;allowDefaultPortFrom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Grant IAM permission to read the DB credentials&lt;/span&gt;
&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;databaseSecret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grantRead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Security teams can easily review these explicit grants rather than untangling complex Security Group rules.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision 3: "Convention over Configuration" for API Routing
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Enterprise Problem&lt;/strong&gt;: Platform engineers become a bottleneck if application developers have to ask them to update IaC every time a new API endpoint (e.g., &lt;code&gt;POST /v1/users&lt;/code&gt;) is created.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Architectural Solution&lt;/strong&gt;: We build dynamic provisioning into the CDK code. Instead of manually instantiating every &lt;code&gt;NodejsFunction&lt;/code&gt; and &lt;code&gt;LambdaIntegration&lt;/code&gt;, we program the CDK to read the application folder structure during synthesis.&lt;/p&gt;

&lt;p&gt;Imagine a project structure like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;├── src/
│   └── lambda/
│       └── api/
│           ├── v1/
│           │   ├── users/
│           │   │   ├── get.ts     # GET /v1/users
│           │   │   └── post.ts    # POST /v1/users
│           │   └── status/
│           │       └── get.ts     # GET /v1/status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CDK &lt;code&gt;Api&lt;/code&gt; construct can dynamically read this directory. It parses the file paths (&lt;code&gt;v1/users&lt;/code&gt;) and the file names (&lt;code&gt;get.ts&lt;/code&gt;), and automatically provisions the required Lambda function and maps it to the API Gateway.&lt;/p&gt;

&lt;p&gt;This pattern massively accelerates Developer Velocity. Application developers can build and deploy new features using standard Node.js practices without ever needing to learn CDK or touch the infrastructure repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision 4: Infrastructure-Aware Database Migrations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Enterprise Problem&lt;/strong&gt;: You deployed the new API and the database, but the application crashes because the SQL tables haven't been created yet. Relying on manual scripts or separate CI/CD steps for database migrations leads to configuration drift and failed deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Architectural Solution&lt;/strong&gt;: We integrate the schema migration (using tools like Drizzle or Prisma) directly into the CDK lifecycle using AWS Custom Resources.&lt;/p&gt;

&lt;p&gt;We define a specific Lambda function (&lt;code&gt;DatabaseMigrator&lt;/code&gt;) that holds our SQL schema files. We then use a &lt;code&gt;custom-resources.Provider&lt;/code&gt; to trigger this Lambda during the CloudFormation deployment process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Inside the main Stack Orchestrator (lib/backend-stack.ts)&lt;/span&gt;

&lt;span class="c1"&gt;// 1. The Migration Trigger&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;databaseMigrationTrigger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CustomResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;MigrationTrigger&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;serviceToken&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;databaseMigratorProvider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;serviceToken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;forceUpdate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;// Ensure it runs every deploy&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// 2. The Dependency Lock&lt;/span&gt;
&lt;span class="nx"&gt;databaseMigrationTrigger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addDependency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;databaseCluster&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By enforcing the dependency (&lt;code&gt;addDependency&lt;/code&gt;), we guarantee the database is fully available before the migration runs. The deployment becomes atomic: if the infrastructure deploys but the migration fails, CloudFormation can halt or roll back. This guarantees that your infrastructure state and your database schema state are always in perfect sync.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision 5: Secure Secrets Management
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Enterprise Problem&lt;/strong&gt;: Developers frequently make the mistake of hardcoding API keys, JWT secrets, or third-party tokens as plain text environment variables in the CDK. When synthesized, these secrets become plainly visible in the generated CloudFormation template (&lt;code&gt;cdk.out&lt;/code&gt;), presenting a massive security vulnerability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Architectural Solution&lt;/strong&gt;: Never pass plaintext secrets into your CDK code. Instead, manually provision your secrets in AWS Secrets Manager (or use automated pipelines to create them), and then have your CDK code &lt;em&gt;reference&lt;/em&gt; them by Name or ARN.&lt;/p&gt;

&lt;p&gt;In our &lt;code&gt;Secrets&lt;/code&gt; construct, we load an existing secret:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lib/constructs/secrets.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;secretsmanager&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;aws-cdk-lib/aws-secretsmanager&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Secrets&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Construct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;authSecrets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;secretsmanager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ISecret&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// We only reference the secret name, not the value!&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;authSecrets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;secretsmanager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Secret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromSecretNameV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AuthSecrets&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Backend/AuthSecrets&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of injecting the actual secret values into our Lambda environment variables, we pass the ARN (Amazon Resource Name) of the secret:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Inside the API construct provisioning the Lambda&lt;/span&gt;
&lt;span class="nx"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;AUTH_SECRETS_ARN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;authSecrets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;secretArn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside the Lambda function execution environment (at runtime), the application uses the AWS SDK to fetch the secret using the ARN. This guarantees that sensitive values are never logged, never stored in Git, and never exposed in the generated CloudFormation JSON files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building serverless applications on AWS is relatively straightforward, but scaling that process across an enterprise requires intent.&lt;/p&gt;

&lt;p&gt;By abandoning monolithic stacks in favor of domain constructs, enforcing strict network topologies, automating developer workflows via dynamic routing, integrating custom resource migrations, and utilizing dynamic secret referencing, you transform CDK from a simple scripting tool into a robust, enterprise-grade platform engineering capability.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cdk/v2/guide/home.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/cdk/v2/guide/home.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>cdk</category>
      <category>architecture</category>
    </item>
    <item>
      <title>CloudWatch RUM vs. Ad blockers : How to fix possible missing telemetry</title>
      <dc:creator>Jérôme GUYON</dc:creator>
      <pubDate>Thu, 30 Apr 2026 16:56:50 +0000</pubDate>
      <link>https://forem.com/aws-builders/cloudwatch-rum-vs-ad-blockers-how-to-fix-possible-missing-telemetry-54j5</link>
      <guid>https://forem.com/aws-builders/cloudwatch-rum-vs-ad-blockers-how-to-fix-possible-missing-telemetry-54j5</guid>
      <description>&lt;p&gt;A few weeks ago, I was reviewing the &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-RUM.html" rel="noopener noreferrer"&gt;Amazon CloudWatch RUM&lt;/a&gt; dashboard for a web application I maintain. Page views were suspiciously low. After some digging, I opened the browser's DevTools on my machine and there it was: uBlock Origin was quietly blocking every request to &lt;code&gt;dataplane.rum.eu-west-1.amazonaws.com&lt;/code&gt;. &lt;strong&gt;Our real user monitoring was blind to a non-negligible portion of our actual traffic.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CloudWatch RUM is one of those AWS services that doesn't get the attention it deserves. But if you care about understanding how &lt;em&gt;real&lt;/em&gt; users experience your application — page load times, JavaScript errors, HTTP failures, Web Vitals — it's genuinely valuable. Here's what the dashboard looks like out of the box:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxytuf0selcjd0f1h6i5i.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxytuf0selcjd0f1h6i5i.gif" alt="RUM Dashboard" width="800" height="578"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The problem is that ad blockers treat its data plane endpoint the same way they treat any third-party tracking domain: a request flying off to &lt;code&gt;dataplane.rum.*.amazonaws.com&lt;/code&gt; looks exactly like telemetry that users might want to block.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The architecture fix is simple&lt;/strong&gt;: your CloudFront distribution already serves your frontend. Add one behavior — &lt;code&gt;/rum/*&lt;/code&gt; — that proxies to the RUM data plane. On the client side, point the &lt;a href="https://github.com/aws-observability/aws-rum-web" rel="noopener noreferrer"&gt;aws-rum-web&lt;/a&gt; SDK to &lt;code&gt;https://yourdomain.com/rum/&lt;/code&gt; instead of the default AWS endpoint. I use AWS CDK here, but the same works with CloudFormation, Terraform, or the console.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8tbii90og763tadbae6k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8tbii90og763tadbae6k.png" alt="RUM + Cloudfront architecture" width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Create the CloudWatch RUM app monitor and its Cognito identity pool.&lt;/strong&gt; RUM needs a Cognito identity pool with unauthenticated access to authorize browsers to send telemetry.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;cognito&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-cognito&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-iam&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;rum&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-rum&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Create an identity pool for RUM (unauthenticated access)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rumIdentityPool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cognito&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnIdentityPool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RumIdentityPool&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;allowUnauthenticatedIdentities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Create the IAM role for unauthenticated users&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;guestRole&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Role&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RumGuestRole&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;assumedBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;WebIdentityPrincipal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cognito-identity.amazonaws.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;StringEquals&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cognito-identity.amazonaws.com:aud&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;rumIdentityPool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ForAnyValue:StringLike&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cognito-identity.amazonaws.com:amr&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;unauthenticated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Attach the identity pool to the role&lt;/span&gt;
&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cognito&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnIdentityPoolRoleAttachment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RumRoleAttachment&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;identityPoolId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;rumIdentityPool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;unauthenticated&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;guestRole&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;roleArn&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Create the RUM app monitor&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rumAppMonitor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;rum&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnAppMonitor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RumAppMonitor&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;myapp.example.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;myapp-rum&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;appMonitorConfiguration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;allowCookies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// Allow X-Ray tracing&lt;/span&gt;
    &lt;span class="na"&gt;enableXRay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// Track 100% of sessions&lt;/span&gt;
    &lt;span class="na"&gt;sessionSampleRate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;telemetries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;performance&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;errors&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;identityPoolId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;rumIdentityPool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Grant the guest role permission to send RUM events&lt;/span&gt;
&lt;span class="nx"&gt;guestRole&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addToPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PolicyStatement&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rum:PutRumEvents&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="s2"&gt;`arn:aws:rum:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;account&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:appmonitor/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;rumAppMonitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Add the &lt;code&gt;/rum/*&lt;/code&gt; behavior to your CloudFront distribution.&lt;/strong&gt; This is the key part. I create an additional behavior that forwards requests matching &lt;code&gt;/rum/*&lt;/code&gt; to the RUM data plane origin.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-cloudfront&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;origins&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-cloudfront-origins&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Build the additional behaviors map&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;additionalBehaviors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BehaviorOptions&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;

&lt;span class="c1"&gt;// Proxy RUM traffic through CloudFront&lt;/span&gt;
&lt;span class="nx"&gt;additionalBehaviors&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/rum/*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;origins&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HttpOrigin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`dataplane.rum.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.amazonaws.com`&lt;/span&gt;
  &lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;viewerProtocolPolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ViewerProtocolPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HTTPS_ONLY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;cachePolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CachePolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CACHING_DISABLED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;allowedMethods&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AllowedMethods&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ALLOW_ALL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;originRequestPolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OriginRequestPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ALL_VIEWER_EXCEPT_HOST_HEADER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Create the distribution (your existing one — just add the behavior)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;distribution&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Distribution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Distribution&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;defaultBehavior&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;origins&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;S3BucketOrigin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withOriginAccessControl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;websiteBucket&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;viewerProtocolPolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ViewerProtocolPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;REDIRECT_TO_HTTPS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="nx"&gt;additionalBehaviors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;domainNames&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;myapp.example.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;certificate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;myCertificate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I choose &lt;code&gt;ALL_VIEWER_EXCEPT_HOST_HEADER&lt;/code&gt; because the RUM data plane expects the &lt;code&gt;Host&lt;/code&gt; header to match its own domain (&lt;code&gt;dataplane.rum.eu-west-1.amazonaws.com&lt;/code&gt;), not yours. If you forward the original &lt;code&gt;Host&lt;/code&gt;, the request will fail with a 403.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Point the RUM web client to your proxied endpoint.&lt;/strong&gt; Install the &lt;a href="https://www.npmjs.com/package/aws-rum-web" rel="noopener noreferrer"&gt;aws-rum-web&lt;/a&gt; package and configure the &lt;code&gt;endpoint&lt;/code&gt; to use your domain instead of the default AWS URL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install the RUM web client&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;aws-rum-web
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;AwsRum&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-rum-web&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rumClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AwsRum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-app-monitor-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// from the CfnAppMonitor&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                     &lt;span class="c1"&gt;// your app version&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;eu-west-1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                 &lt;span class="c1"&gt;// region&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;sessionSampleRate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;identityPoolId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;eu-west-1:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// This is the magic line — point to your own domain&lt;/span&gt;
    &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://myapp.example.com/rum/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;telemetries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;performance&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;errors&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;allowCookies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;enableXRay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Et voilà! The browser now sends RUM telemetry to &lt;code&gt;https://myapp.example.com/rum/&lt;/code&gt;, which CloudFront proxies to the actual RUM data plane. Ad blockers see a first-party request and leave it alone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Things to know&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ad blocker filter lists&lt;/strong&gt; — Popular lists like EasyPrivacy and uBlock filters include patterns matching &lt;code&gt;dataplane.rum.*.amazonaws.com&lt;/code&gt; and the RUM CDN script URL (&lt;code&gt;client.rum.*.amazonaws.com&lt;/code&gt;). By proxying through your own domain, you bypass both. If you use the NPM installation method (recommended), the script itself is bundled in your app — only the data plane calls need proxying.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt; — $1 per 100,000 RUM events. A typical visit generates ~20 events. For 500K monthly visits: ~$100/month. CloudFront proxy overhead is negligible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session sample rate&lt;/strong&gt; — In production, consider setting &lt;code&gt;sessionSampleRate&lt;/code&gt; to something lower than &lt;code&gt;1&lt;/code&gt; (e.g., &lt;code&gt;0.1&lt;/code&gt; for 10% sampling) to control costs while still getting statistically meaningful data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;X-Ray integration&lt;/strong&gt; — With &lt;code&gt;enableXRay: true&lt;/code&gt;, RUM traces connect to your backend X-Ray traces, giving you end-to-end visibility from the browser click to the database query. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CloudWatch RUM is one of those "set it and forget it" services that quietly delivers real value — but only if it actually receives data. If you're already using it, proxy it through your own domain or you're likely missing a significant chunk of your user base. And if you're not using it yet, I'd strongly suggest you &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-RUM.html" rel="noopener noreferrer"&gt;have a look&lt;/a&gt; — understanding how real users experience your app is worth the small setup effort.&lt;/p&gt;

&lt;p&gt;— Jerome&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>aws</category>
      <category>monitoring</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Cost-Efficient Serverless Workflows with Express Step Functions</title>
      <dc:creator>Matt Morgan</dc:creator>
      <pubDate>Thu, 30 Apr 2026 12:01:42 +0000</pubDate>
      <link>https://forem.com/aws-builders/cost-efficient-serverless-workflows-with-express-step-functions-e54</link>
      <guid>https://forem.com/aws-builders/cost-efficient-serverless-workflows-with-express-step-functions-e54</guid>
      <description>&lt;p&gt;Lambda and API Gateway are the bread and butter of the AWS serverless ecosystem. Lambda offers a compelling programming model of inputs and outputs. Lambda's name is taken from the concept of simple anonymous functions and implies simplicity and ease of use. Lambda delivers on this promise beautifully when our requirements are simple: "enqueue this message" or "fetch the item with this key from the database".&lt;/p&gt;

&lt;p&gt;In the real world, our requirements aren't always so simple. Sometimes we must do multiple things in the scope of a synchronous request. Sometimes branching logic is necessary. When we drift from the "just a function" model of programming in Lambda, we can start to see challenges with cost, performance, and observability.&lt;/p&gt;

&lt;p&gt;Synchronous Express Workflows for AWS Step Functions was &lt;a href="https://aws.amazon.com/blogs/compute/new-synchronous-express-workflows-for-aws-step-functions/" rel="noopener noreferrer"&gt;announced&lt;/a&gt; back in 2020. I was instantly intrigued by the solution, but it wasn't until last year that I had a chance to really try them at scale. Based on my experience over the past year, this is a great way to build &lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html" rel="noopener noreferrer"&gt;Well-Architected&lt;/a&gt; microservices.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This article includes &lt;a href="https://github.com/elthrasher/cost-efficient-microservices" rel="noopener noreferrer"&gt;source code for a sample project&lt;/a&gt; that demonstrates how we can use Express Workflows to create performant and economical microservices.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's consider a workflow for receiving orders on an e-commerce website. We are going to receive a web request and then do the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Transform and validate the incoming request&lt;/li&gt;
&lt;li&gt;Validate the order (validate products, compute totals, etc.)&lt;/li&gt;
&lt;li&gt;Reserve each inventory item&lt;/li&gt;
&lt;li&gt;Process the payment with the chosen processor (Stripe, PayPal, or Apple Pay in the example)&lt;/li&gt;
&lt;li&gt;Save the order to the database&lt;/li&gt;
&lt;li&gt;Kick off post-order processing (notification, logging, metrics)&lt;/li&gt;
&lt;li&gt;Send a response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We also need to handle errors, guarantee consistency, and respond promptly. Here is how that looks in the Step Functions console.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5nv4ukyu6dqzgnnqowr7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5nv4ukyu6dqzgnnqowr7.png" alt="Express Workflow" width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We've gained some efficiency here by running a couple of steps in parallel, and we've also handled a variety of error states.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you are new to Step Functions, I recommend building in the console. &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/workflow-studio.html" rel="noopener noreferrer"&gt;Step Functions Workflow Studio&lt;/a&gt; gives you a drag-and-drop interface and the ability to export the result of your work to an IaC solution.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Project Layout
&lt;/h2&gt;

&lt;p&gt;Our sample project manages infrastructure with &lt;a href="https://aws.amazon.com/cdk/" rel="noopener noreferrer"&gt;AWS CDK&lt;/a&gt;. Starting with a basic CDK project, we build out some functions and our stack like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;serverless-order-processor/
├── bin/
│   └── app.ts                          # CDK app entry
├── lib/
│   ├── order-processor-stack.ts        # CDK stack (infra + state machine)
│   └── order-workflow.ts               # Step Functions definition (CDK constructs)
├── functions/
│   ├── validate-order.ts               # Check products exist, prices match
│   ├── reserve-inventory.ts            # Reserve single item (used by Map state)
│   ├── release-inventory.ts            # Compensation: undo all reservations
│   ├── process-payment.ts              # Route to processor by config
│   ├── save-order.ts                   # Write final order to DynamoDB
│   ├── get-order.ts                    # GET /orders/{id}
│   └── list-products.ts                # GET /products
├── scripts/
│   └── seed.ts                         # Seed products + inventory
├── cdk.json
├── package.json
├── tsconfig.json
└── README.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our stack creates a DynamoDB table to store our products, inventory, and orders. It bundles and provisions our Lambda functions. It describes our state machine, synthesizes an ASL (Amazon States Language) definition, binds the Lambda functions to the state machine, and binds the state machine to an API Gateway that we use to synchronously invoke the workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow Construct
&lt;/h2&gt;

&lt;p&gt;There are different patterns for writing CDK code. I prefer to create &lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/aws-cdk-layers/layer-3.html" rel="noopener noreferrer"&gt;L3 constructs&lt;/a&gt; to contain complex business patterns. That looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderWorkflow&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Construct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;stateMachine&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;StateMachine&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;OrderWorkflowProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// steps&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;definition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;transformRequest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;validateRequiredFields&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// --- State Machine ---&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stateMachine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StateMachine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;OrderStateMachine&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;definitionBody&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DefinitionBody&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromChainable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;definition&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;stateMachineName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;order-workflow&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;stateMachineType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;StateMachineType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;EXPRESS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="c1"&gt;// more props&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps my &lt;a href="https://github.com/elthrasher/cost-efficient-microservices/blob/main/lib/order-processor-stack.ts#L115" rel="noopener noreferrer"&gt;main stack&lt;/a&gt; much cleaner than putting all that code inline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;    &lt;span class="c1"&gt;// --- Step Functions workflow ---&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OrderWorkflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;OrderWorkflow&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;validateOrderFn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;reserveInventoryFn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;releaseInventoryFn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;processPaymentFn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;saveOrderFn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most steps are implemented with Lambda functions, though error-handling and data transformations can be handled with Pass states and JSONata. For more on JSONata, check out &lt;a href="https://dev.to/aws-builders/create-stateful-serverless-workflows-with-aws-step-functions-and-jsonata-2oe3"&gt;Create Stateful Serverless Workflows with AWS Step Functions and JSONata&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Each step needs a catch block that handles specific errors that step may throw. We need to decide whether to retry the step or fail, based on the error it threw. If the step indicates we can't proceed with the order due to insufficient inventory, it doesn't make sense to retry. But if a step fails due to a service error, throttling, or a partner's technical issues, a retry may be very helpful.&lt;/p&gt;

&lt;h2&gt;
  
  
  API Gateway Integration
&lt;/h2&gt;

&lt;p&gt;API Gateway provides a direct integration pattern that invokes our state machine. This is simplified and abstracted using the CDK construct &lt;code&gt;StepFunctionsIntegration&lt;/code&gt;. This construct is useful, but I prefer to modify it slightly. Let's look at the sample project code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sfnIntegration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;StepFunctionsIntegration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startExecution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stateMachine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;integrationResponses&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;requestTemplates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;requestTemplate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;useDefaultMethodResponses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;ordersResource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addMethod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sfnIntegration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;methodResponses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;200&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;400&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;500&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We create the integration and then bind it to the POST method of the orders resource. There are a couple of things I changed here to better suit our use case. I set &lt;code&gt;useDefaultMethodResponses&lt;/code&gt; to false and supplied our own response templates. The reason is that the default response template returns a 500 if the state machine execution throws an error, or a 200 if it doesn't. I wanted to return a 400 for validation errors. To do this, we use a &lt;a href="https://velocity.apache.org/engine/1.7/user-guide.html" rel="noopener noreferrer"&gt;Velocity Template&lt;/a&gt; to detect an error key in the response and remap the response to 400 if it's present.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;successResponseTemplate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="s2"&gt;`#set($sfnOutput = $input.path('$.output'))`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;`#if($sfnOutput.toString().contains('"status":"error"'))`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;`#set($context.responseOverride.status = 400)`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;`#end`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;`$sfnOutput`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Named Executions
&lt;/h3&gt;

&lt;p&gt;I also wanted to take advantage of the ability to name an execution, so I provided a custom request template.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;requestTemplate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="s2"&gt;`#set($customerId = $util.parseJson($input.body).get('customerId'))`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;`{`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;`  "input": "$util.escapeJavaScript($input.body).replaceAll("&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="s2"&gt;'", "'")",`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;`  "name": "$util.escapeJavaScript($customerId)",`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;`  "stateMachineArn": "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stateMachine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stateMachineArn&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;`}`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a modified and simplified version of the &lt;a href="https://github.com/aws/aws-cdk/blob/main/packages/aws-cdk-lib/aws-apigateway/lib/integrations/stepfunctions.vtl" rel="noopener noreferrer"&gt;request template that ships with AWS CDK&lt;/a&gt;. If you plan to do something like this, I suggest going back to the source to make sure you don't miss anything.&lt;/p&gt;

&lt;p&gt;The way the named execution works is that we know any valid request will include a customerId, so we pull it from the JSON and set it to the &lt;code&gt;name&lt;/code&gt; attribute in the request payload. Step Functions automatically appends a hash, so we don't need to worry about uniqueness. As a result, we can easily find our customer transactions in the Step Functions console!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4u49aa6s725olqaz2p0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4u49aa6s725olqaz2p0.png" alt="Named executions in the Step Functions console" width="733" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability
&lt;/h2&gt;

&lt;p&gt;It's essentially a must to enable &lt;a href="https://github.com/elthrasher/cost-efficient-microservices/blob/main/lib/order-workflow.ts#L434" rel="noopener noreferrer"&gt;CloudWatch logging&lt;/a&gt;. The named execution lets us find an exact execution that may have gone awry or be of interest. Then we can see exactly what happened, inspect logs, and improve our flow. The state machine execution will show JSONata variables at every step, as well as all inputs and outputs for each step. It's hard to imagine getting this level of fidelity in a trace without using Step Functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost
&lt;/h2&gt;

&lt;p&gt;We need to consider trade-offs. Sure, this might help manage complexity, but doesn't it cost more? What about adding latency with state transitions or cold starts?&lt;/p&gt;

&lt;p&gt;Let's start with cost. These services are very, very cheap if used correctly. It's often observed that the most expensive part of a serverless stack is the logging, and I can attest to that. These prices are in USD and for us-east-1&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Gateway Rest API charges $3.50 for 1 million requests.&lt;/li&gt;
&lt;li&gt;1 million Express workflow executions with an average duration of 3 seconds and 64MB memory bills slightly higher at $4.13&lt;/li&gt;
&lt;li&gt;5 million Lambda executions averaging 500ms duration with 128 MB is just $5.17 (excluding free tier).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The total for this part of the stack is $12.80/month (plus charges for the database and other services, which are beyond our scope here). That is an incredible price for 1 million executions. Cost scaling is mostly linear. If we scale this to 10 million requests, our bill is $127.93. We start to see better pricing tiers as we move to 100 million requests, with a monthly bill of $ 1,150.05. 100 million requests would indicate an average of 38 checkout conversions every second for the entire month, quite a brisk business! I'm not kidding that logging will be the big expense at that volume. I'm not attempting the math here because it's highly dependent on your use case, but suffice it to say you'll want to keep an eye on it, make sure you're only logging what is necessary, and set reasonable data retention on your log groups.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;Now we've demonstrated that this architecture is great for managing complexity and that it's cost-effective. What about performance? Doesn't it stand to reason that passing state between multiple functions would be slower than encapsulating all the logic within a single function?&lt;/p&gt;

&lt;p&gt;Not necessarily. First, Express Step Functions are extremely fast. In my testing, I'm through the first pass step and into my ValidateOrder Lambda function within milliseconds. Small functions with minimal dependencies will load and execute very quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parallel and Map States
&lt;/h3&gt;

&lt;p&gt;The real value prop here is the ability to execute multiple functions in parallel. Imagine I'm checking out a cart with four different items. Most implementations will reserve the inventory sequentially. We could use Node.js or Go to wait on multiple requests at once, but there are downsides to doing that in a single function. We might need extra memory in anticipation of a large order. We have to add logic to handle the case where the order can only be partially fulfilled, which now mixes concerns. Our Express Workflow can run the same simple function in a &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/state-map.html" rel="noopener noreferrer"&gt;Map state&lt;/a&gt;, then handle the combined results. We can even limit downstream impacts by setting a &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/state-map.html" rel="noopener noreferrer"&gt;max concurrency&lt;/a&gt; limit on the map state, so a very large order doesn't attempt to adjust inventory for 100+ items in parallel.&lt;/p&gt;

&lt;p&gt;What about executing unlike things in parallel? Express workflows can handle that as well. Zooming in here, we can see that we're persisting the order to the database while we send the confirmation and update our metrics. Most order-processing systems will handle those sequentially.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fof41ox6qsad1bv1lqa37.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fof41ox6qsad1bv1lqa37.png" alt="Express Workflow Parallel Step" width="800" height="215"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Cold Starts
&lt;/h3&gt;

&lt;p&gt;What about all the cold starts? Won't the extra functions cause extra startup latency? Well, first of all, &lt;a href="https://dev.to/aws/cold-starts-are-dead-5fod"&gt;read this&lt;/a&gt;. If that doesn't convince you, my experience working in serverless for the better part of a decade now is that I don't worry about them at all. Yes, sometimes a function will start, costing you 200ms. At scale, this isn't much, because that function container may be invoked tens or even hundreds of thousands of times during its lifecycle, and that 200ms tax is paid only once across all those invocations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling and Service Limits
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step Functions
&lt;/h3&gt;

&lt;p&gt;A casual reading of the &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/service-quotas.html#service-limits-api-action-throttling-general" rel="noopener noreferrer"&gt;docs&lt;/a&gt; suggests that express workflows may begin to throttle at 6000 RPS. This isn't the case - or rather, this is only the case for asynchronous invocations. For synchronous invocations, there's no rate limit at all! This is very rare for an AWS service, and it comes with an advisory.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Synchronous Express execution API calls don't contribute to existing account capacity limits. Step Functions provides capacity on demand and automatically scales with sustained workload. Surges in workload may be throttled until capacity is available.&lt;br&gt;
If you experience throttling, try again after some time. For information about Synchronous Express workflows, see &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/choosing-workflow-type.html#concepts-express-synchronous" rel="noopener noreferrer"&gt;Synchronous and Asynchronous Express Workflows in Step Functions&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I have never seen a synchronous execution throttle, but I have seen services backed by Express Workflows scale very quickly. If you plan to operate this service on a very large scale, it's a good idea to speak with your account team.&lt;/p&gt;

&lt;h3&gt;
  
  
  API Gateway
&lt;/h3&gt;

&lt;p&gt;With no hard limit on Step Functions executions, we need to look at API Gateway quotas. API Gateway can handle 10,000 RPS (sustained) per region. This limit can be increased and applies at the account and region levels.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda
&lt;/h3&gt;

&lt;p&gt;Lambda has a maximum concurrency rate of 1,000 concurrent executions at the account level. This can also be increased. This is your most likely source of throttles in this kind of architecture. Since we're invoking multiple functions per workflow, sometimes in parallel, we can quickly reach that 1,000 limit. Fortunately, this limit can be adjusted in the AWS Console. It's a very good idea to project your concurrency needs and set your quota appropriately. If you set it too high, a bug could result in an expensive bill. If you set it too low, you may experience throttling.&lt;/p&gt;

&lt;p&gt;It's also a good idea to set &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html" rel="noopener noreferrer"&gt;reserved concurrency&lt;/a&gt; on every Lambda function. This is different than provisioned concurrency (prepay to keep a function "warm"). Instead, reserved concurrency protects your function from a "noisy neighbor" eating up all of your concurrent executions.&lt;/p&gt;

&lt;p&gt;Finally, you can handle throttles with retries in Step Functions. Even synchronous executions should implement retries in case of throttle or service failure. If your service is operating at extremely high throughput, you might throttle, wait a few milliseconds, then receive a successful invocation. This is a much better outcome than failing with a 429 error!&lt;/p&gt;

&lt;p&gt;To see this in action, I created a &lt;a href="https://github.com/elthrasher/cost-efficient-microservices/blob/main/scripts/load-test.sh" rel="noopener noreferrer"&gt;simple load test script&lt;/a&gt; using &lt;a href="https://k6.io/" rel="noopener noreferrer"&gt;k6&lt;/a&gt;. Here's a sample run from my laptop.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
         /&lt;span class="se"&gt;\ &lt;/span&gt;     Grafana   /‾‾/
    /&lt;span class="se"&gt;\ &lt;/span&gt; /  &lt;span class="se"&gt;\ &lt;/span&gt;    |&lt;span class="se"&gt;\ &lt;/span&gt; __   /  /
   /  &lt;span class="se"&gt;\/&lt;/span&gt;    &lt;span class="se"&gt;\ &lt;/span&gt;   | |/ /  /   ‾‾&lt;span class="se"&gt;\&lt;/span&gt;
  /          &lt;span class="se"&gt;\ &lt;/span&gt;  |   &lt;span class="o"&gt;(&lt;/span&gt;  |  &lt;span class="o"&gt;(&lt;/span&gt;‾&lt;span class="o"&gt;)&lt;/span&gt;  |
 / __________ &lt;span class="se"&gt;\ &lt;/span&gt; |_|&lt;span class="se"&gt;\_\ &lt;/span&gt; &lt;span class="se"&gt;\_&lt;/span&gt;____/


     execution: &lt;span class="nb"&gt;local
        &lt;/span&gt;script: scripts/load-test.js
        output: -

     scenarios: &lt;span class="o"&gt;(&lt;/span&gt;100.00%&lt;span class="o"&gt;)&lt;/span&gt; 1 scenario, 200 max VUs, 1m5s max duration &lt;span class="o"&gt;(&lt;/span&gt;incl. graceful stop&lt;span class="o"&gt;)&lt;/span&gt;:
              &lt;span class="k"&gt;*&lt;/span&gt; default: Up to 200 looping VUs &lt;span class="k"&gt;for &lt;/span&gt;35s over 4 stages &lt;span class="o"&gt;(&lt;/span&gt;gracefulRampDown: 30s, gracefulStop: 30s&lt;span class="o"&gt;)&lt;/span&gt;


━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 Load Test Summary
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Total requests:  9442
  Failure rate:    0.0%
  Latency avg:     419ms
  Latency med:     408ms
  Latency p90:     499ms
  Latency p95:     555ms
  Latency max:     3673ms
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

running &lt;span class="o"&gt;(&lt;/span&gt;0m35.5s&lt;span class="o"&gt;)&lt;/span&gt;, 000/200 VUs, 9442 &lt;span class="nb"&gt;complete &lt;/span&gt;and 0 interrupted iterations
default ✓ &lt;span class="o"&gt;[======================================]&lt;/span&gt; 000/200 VUs  35s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'm not able to generate much more load than that on a MacBook Pro, but this does illustrate how easily this architecture handles traffic spikes. "Scalability" here is a bit of a misnomer, as my service isn't scaling up to handle the load. Instead, there is available capacity to meet my needs at all times!&lt;/p&gt;

&lt;h2&gt;
  
  
  More on Step Functions
&lt;/h2&gt;

&lt;p&gt;If you found this article helpful, check out my other writing on Step Functions!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/aws-builders/create-stateful-serverless-workflows-with-aws-step-functions-and-jsonata-2oe3"&gt;https://dev.to/aws-builders/create-stateful-serverless-workflows-with-aws-step-functions-and-jsonata-2oe3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/aws-builders/avoiding-the-serverless-workflow-antipattern-2ba1"&gt;https://dev.to/aws-builders/avoiding-the-serverless-workflow-antipattern-2ba1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/elthrasher/exploring-aws-cdk-step-functions-1d1e"&gt;https://dev.to/elthrasher/exploring-aws-cdk-step-functions-1d1e&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Cover by Glynis Morgan&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>microservices</category>
      <category>stepfunctions</category>
    </item>
    <item>
      <title>Lambda Multi-tenanted Isolation</title>
      <dc:creator>Gary Mclean</dc:creator>
      <pubDate>Thu, 30 Apr 2026 10:36:48 +0000</pubDate>
      <link>https://forem.com/aws-builders/lambda-multi-tenanted-isolation-1ban</link>
      <guid>https://forem.com/aws-builders/lambda-multi-tenanted-isolation-1ban</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In any application or system, we must have safeguards in place to prevent cross-customer data exposure. Our software is developed using a range of approaches, from human-written code to AI-assisted generation and regardless of how code is produced, the risk of unintended data exposure remains a critical concern.&lt;/p&gt;

&lt;p&gt;Developers, Engineering Managers, and Security teams should be aware of potential data exposures and the additional controls which can be put in place as preventive measures.&lt;/p&gt;

&lt;p&gt;Once data is exposed or lost, it cannot be undone. Consequently, a data breach may result in serious impacts, including financial loss and reputational harm.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;Ensuring data security is a fundamental requirement for applications, both internal and externally exposed. A traditional three-tier architecture is composed of a presentation layer (web tier), an application layer (application tier), and a database layer (database tier).&lt;/p&gt;

&lt;p&gt;Any code execution within the application layer more often than not retrieves data from the database layer and returns it to the consumer. The database layer will be the most secure area, with data access tightly controlled and very limited in scope.&lt;/p&gt;

&lt;p&gt;Understanding this architecture is important context before exploring how serverless compute and AWS Lambda specifically introduces a unique set of considerations around how application code is executed and how memory is managed between invocations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-tenant
&lt;/h2&gt;

&lt;p&gt;A multi-tenant application is a software architecture commmonly used in Software as a Service (SaaS) where a single applcation instance services multiple customers while keeping each customers data logically isolated.&lt;/p&gt;

&lt;p&gt;Multiple tenants use the same application layer and access to data is controlled using identifiable information obtained during authentication. A unique identifier such as such as a Company ID, Tenant ID or another identifier would be used to aid in only retrieving data in scope for that user.&lt;/p&gt;

&lt;p&gt;Even though logically, data is isolated at source, the same application code in the same instance can be executed repeatedly. APIs generally do not restart or spin up independent environments as this would become expense. While using the same execution environment, generally the same same memory address space will be reused to store STATIC and variable data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challange
&lt;/h2&gt;

&lt;p&gt;Many SaaS companies host their offerings across Cloud providers such as AWS, utilising serverless compute like as Lambda. There are many articles and documentation that deep dive into Lambda, though at a high level, Lambda is a service which allows code to run without the need to manage servers.&lt;/p&gt;

&lt;p&gt;A Lambda execution environment lifecycle can be grouped into 3 phases&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqt15hftm51rthwcy1djq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqt15hftm51rthwcy1djq.png" alt="Lambda Lifecycle phases" width="798" height="93"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Invoke phase is where the core business logic executes, code queries data from the database, performs actions against it and returns data to the consumer. &lt;br&gt;
Post invoke phase, the Lambda execution environment may or may not shutdown, or remain running waiting for the next invocation.&lt;/p&gt;

&lt;p&gt;When a Lambda function is initialised for the first time, its execution environment is fresh; variables are empty and no prior state exists. However, AWS Lambda reuses warm execution environments for subsequent invocations as a performance optimisation. This means that residual data from a previous invocation; such as values held in temporary variables or files written to the ephemeral file system (/tmp); may still be present when the next invocation begins. Without proper hygiene practices in place, this leftover data carries a significant risk of being inadvertently exposed to the next request or customer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-tenant function?
&lt;/h2&gt;

&lt;p&gt;Lambda runs your code inside execution environments.&lt;br&gt;
Small, secure Firecracker microVMs that handle an invocation and then sit warm, waiting for the next one. That's efficient until you realise those environments get reused across invocations. Your function serves a request from Tenant A, caches some config or credentials in memory, and then the next request comes in from Tenant B, potentially landing in the same environment, with access to whatever Tenant A left behind. &lt;/p&gt;

&lt;p&gt;If your code is perfect, that's fine. In practice, it isn't. One oversight in your data handling and you have a cross-tenant data exposure incident.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mitigation
&lt;/h2&gt;

&lt;p&gt;It is worth noting that under certain conditions, residual data may never actually reach another invocation. Factors such as environment load, the rate at which execution environments are initialised and torn down, and whether Provisioned Concurrency is configured can all influence how long a warm environment persists. In high-churn scenarios where environments are frequently recycled, leftover data may be naturally cleared before it has the opportunity to be exposed. However, this should never be relied upon as a security control; it is an unpredictable side effect of infrastructure behaviour, not a guarantee.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approaches
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Variable initialisation or more specifically explicit initialisation
&lt;/h3&gt;

&lt;p&gt;The first approach is to run a Lambda function per business process, such as an APi Resource which returns data per customer based of their identity supplied during authentication. Ensure code correctly cleanses the environment at the start of invocation where the practice of deliberately setting variables to a known, clean state before any logic executes, rather than assuming they are empty.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per-tenant Lambda function 1-2-1
&lt;/h3&gt;

&lt;p&gt;The highest degree of isolation would be to create a Lambda function per tenant. Each tenant would have its own dedicated function assigned exclusively to them for code execution. While this approach maximises data cleanliness, it is difficult to maintain at scale; API limits when updating many functions simultaneously, complex CI/CD pipelines, monitoring and alerting sprawl across a large number of Log Groups, and considerably longer deployment times all become significant operational burdens. For most organisations, the overhead of managing this model outweighs the isolation benefits it provides.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda tenant isolation mode
&lt;/h3&gt;

&lt;p&gt;Tenant isolation mode exists for a specific scenario: you're running a single Lambda function that serves multiple end-users or tenants, and you need hard guarantees that their execution environments never bleed into one another.&lt;/p&gt;

&lt;p&gt;Two situations make this non-negotiable.&lt;br&gt;
First, if your tenants execute their own code. Isolated environments limit the blast radius when that code misbehaves, whether through bugs or something more deliberate. Second, if you're processing sensitive, tenant-specific data. Shared environments create exposure risk; isolation removes it.&lt;/p&gt;

&lt;p&gt;With tenant isolation mode enabled, you pass a tenant identifier with each function invocation. Lambda uses that identifier to route requests to underlying execution environments, ensuring that an environment associated with one tenant is never used to serve requests from another.&lt;/p&gt;

&lt;h4&gt;
  
  
  Limitations
&lt;/h4&gt;

&lt;p&gt;Tenant isolation mode is not supported with functions that use function URLs, provisioned concurrency, or SnapStart. You can send requests to a tenant-isolated function using synchronous invocations, asynchronous invocations, or by using Amazon API Gateway as an event-trigger.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;Tenant isolation mode eliminates the need for custom isolation logic or separate per-tenant functions, letting you focus on business logic while AWS handles the complexities of tenant-aware compute environment isolation. For SaaS builders running sensitive workloads or executing user-supplied code, that's a significant operational and security improvement, it was a long time coming.&lt;/p&gt;

</description>
      <category>lambda</category>
      <category>security</category>
      <category>aws</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Building a Serverless DynamoDB MCP: Making Your AI Talk to Your Database</title>
      <dc:creator>Yeshwanth L M</dc:creator>
      <pubDate>Thu, 30 Apr 2026 10:16:53 +0000</pubDate>
      <link>https://forem.com/aws-builders/building-a-serverless-dynamodb-mcp-making-your-ai-talk-to-your-database-3jne</link>
      <guid>https://forem.com/aws-builders/building-a-serverless-dynamodb-mcp-making-your-ai-talk-to-your-database-3jne</guid>
      <description>&lt;h1&gt;
  
  
  Building a Serverless DynamoDB MCP: Making Your AI Talk to Your Database
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fim9m2eji32nl45ls38hg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fim9m2eji32nl45ls38hg.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Have you ever wished you could just &lt;em&gt;ask&lt;/em&gt; your AI assistant to query your database? Something like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Hey Kiro, show me all active users from my DynamoDB table"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;or&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Add a new user named Alice with email &lt;a href="mailto:alice@example.com"&gt;alice@example.com&lt;/a&gt; to the Users table"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Well, that's exactly what we're building today! 🚀&lt;/p&gt;

&lt;h2&gt;
  
  
  The Big Picture: What Are We Building?
&lt;/h2&gt;

&lt;p&gt;We're creating a &lt;strong&gt;serverless MCP (Model Context Protocol) backend&lt;/strong&gt; on AWS that enables AI assistants like Kiro to interact with DynamoDB tables conversationally. Think of it as giving Kiro a direct, secure phone line to your DynamoDB database.&lt;/p&gt;

&lt;p&gt;Here's what makes this special:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10 DynamoDB operations&lt;/strong&gt; exposed as natural language tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Completely serverless&lt;/strong&gt; - runs on AWS Lambda&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure by default&lt;/strong&gt; - AWS IAM authentication with SigV4 signing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero local dependencies&lt;/strong&gt; - all the heavy lifting happens in the cloud&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-configuring&lt;/strong&gt; - tools are discovered dynamically&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wait, What's MCP?
&lt;/h2&gt;

&lt;p&gt;Before we dive in, let's talk about MCP (Model Context Protocol). &lt;/p&gt;

&lt;p&gt;Think of MCP as a standardized way for AI assistants to use external tools. It's like giving your AI a toolbox where each tool does something specific - query a database, fetch weather data, send emails, etc.&lt;/p&gt;

&lt;p&gt;The protocol works like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;AI assistant connects to an MCP server&lt;/li&gt;
&lt;li&gt;Server tells AI what tools are available&lt;/li&gt;
&lt;li&gt;AI can call these tools when needed&lt;/li&gt;
&lt;li&gt;Server executes the tool and returns results&lt;/li&gt;
&lt;li&gt;AI uses the results to help the user&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The beauty?&lt;/strong&gt; The AI doesn't need to know &lt;em&gt;how&lt;/em&gt; the tools work internally. It just needs to know &lt;em&gt;what&lt;/em&gt; they do and &lt;em&gt;how&lt;/em&gt; to call them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Build This Serverless?
&lt;/h2&gt;

&lt;p&gt;You might ask: "Why not just run a local server on my machine?"&lt;/p&gt;

&lt;p&gt;Great question! Here's why serverless wins:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Centralized Management&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;One deployment serves all your team members. Update once, everyone benefits. No "it works on my machine" problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Security at Scale&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;IAM-based authentication (no API keys to rotate)&lt;/li&gt;
&lt;li&gt;Each Lambda has scoped permissions&lt;/li&gt;
&lt;li&gt;Audit logs for every database operation&lt;/li&gt;
&lt;li&gt;Secrets managed by AWS Secrets Manager&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Cost Efficiency&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Pay only when you use it. Lambda charges per request, not per hour. Most hobby projects? Practically free under AWS free tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Automatic Scaling&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Whether it's you at 2 AM or your whole team during peak hours, it just works.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;No Infrastructure Headaches&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;No servers to patch, no runtime versions to manage, no "why is Python 3.8 broken on my Mac?"&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture: How It All Fits Together
&lt;/h2&gt;

&lt;p&gt;Let me paint you a picture of how this works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────┐
│  You: "Show me all  │
│  users from Users   │
│  table"             │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Claude Desktop     │ ← Your AI assistant
│  (MCP Client)       │
└──────────┬──────────┘
           │ stdio / JSON-RPC
           ▼
┌─────────────────────┐
│  Local Proxy        │ ← Signs requests with your AWS credentials
│  (proxy.sh)         │
└──────────┬──────────┘
           │ HTTPS + AWS IAM Auth
           ▼
┌─────────────────────┐
│  API Gateway        │ ← Entry point to AWS
│  (HTTP API)         │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Lambda Functions   │ ← 11 functions, one per operation
│  - get-item         │
│  - put-item         │
│  - query            │
│  - scan             │
│  - etc...           │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  DynamoDB Tables    │ ← Your actual data
└─────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Flow, Step by Step:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You ask Kiro&lt;/strong&gt; something about your database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kiro recognizes&lt;/strong&gt; it needs to use a DynamoDB tool&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local proxy intercepts&lt;/strong&gt; the request and signs it with AWS SigV4&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway validates&lt;/strong&gt; the signature (IAM authentication)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lambda function executes&lt;/strong&gt; the DynamoDB operation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result comes back&lt;/strong&gt; as human-readable text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kiro uses the result&lt;/strong&gt; to answer your question&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The genius here? &lt;strong&gt;Kiro has no idea&lt;/strong&gt; it's talking to AWS. It thinks it's using a local tool. All the cloud complexity is hidden.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Key Design Decisions
&lt;/h2&gt;

&lt;p&gt;Let me walk you through the "why" behind each major decision:&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision 1: Why Plain-Text Responses?
&lt;/h3&gt;

&lt;p&gt;DynamoDB returns data in this format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Item"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"S"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user001"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"S"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Alice Johnson"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"age"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"N"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"28"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ugly, right? Those &lt;code&gt;{"S": ...}&lt;/code&gt; and &lt;code&gt;{"N": ...}&lt;/code&gt; wrappers are DynamoDB's type system.&lt;/p&gt;

&lt;p&gt;Our Lambda functions convert this to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;Item from table 'Users'&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;user001&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Alice Johnson&lt;/span&gt;
  &lt;span class="na"&gt;age&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;28&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why?&lt;/strong&gt; Because Kiro can narrate this naturally to you. No JSON parsing needed. It's optimized for conversation, not computation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision 2: Why One Lambda Per Operation?
&lt;/h3&gt;

&lt;p&gt;We could've built one mega-Lambda that handles everything. But we didn't. Here's why:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Principle of Least Privilege&lt;/strong&gt;: Each Lambda gets &lt;em&gt;only&lt;/em&gt; the permissions it needs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;get-item&lt;/code&gt; Lambda → &lt;code&gt;dynamodb:GetItem&lt;/code&gt; permission only&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;put-item&lt;/code&gt; Lambda → &lt;code&gt;dynamodb:PutItem&lt;/code&gt; permission only&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;delete-item&lt;/code&gt; Lambda → &lt;code&gt;dynamodb:DeleteItem&lt;/code&gt; permission only&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If one Lambda gets compromised? Damage is limited.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clear Separation&lt;/strong&gt;: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each Terraform file = One Lambda&lt;/li&gt;
&lt;li&gt;Easy to understand, easy to modify&lt;/li&gt;
&lt;li&gt;Want to remove scan operation? Delete one file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cost Optimization&lt;/strong&gt;:&lt;br&gt;
Lambda charges by execution time. Smaller functions = faster cold starts = lower costs.&lt;/p&gt;
&lt;h3&gt;
  
  
  Decision 3: Why Self-Configuring Tools?
&lt;/h3&gt;

&lt;p&gt;The proxy script doesn't have any hardcoded tool definitions. On startup, it calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;GET /tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And receives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dynamodb_get_item"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Retrieve a single item from DynamoDB..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"inputSchema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"route"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/dynamodb/get-item"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The magic?&lt;/strong&gt; Add a new tool to &lt;code&gt;dynamodb_ops.py&lt;/code&gt;, deploy, and the proxy &lt;em&gt;automatically&lt;/em&gt; discovers it. No client-side updates needed.&lt;/p&gt;

&lt;p&gt;This follows the Unix philosophy: &lt;strong&gt;"mechanism, not policy."&lt;/strong&gt; The proxy provides the mechanism (SigV4 signing, JSON-RPC), but the backend defines the policy (what tools exist).&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision 4: Why AWS IAM Instead of API Keys?
&lt;/h3&gt;

&lt;p&gt;Traditional approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"super-secret-key-123"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Uses your AWS credentials&lt;/span&gt;
&lt;span class="c"&gt;# Same ones you use for AWS CLI&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ No keys to rotate every 90 days&lt;/li&gt;
&lt;li&gt;✅ Integrates with your existing AWS setup&lt;/li&gt;
&lt;li&gt;✅ CloudTrail logs every request&lt;/li&gt;
&lt;li&gt;✅ Can revoke access instantly via IAM&lt;/li&gt;
&lt;li&gt;✅ Supports MFA, temporary credentials, SSO&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The proxy signs every request&lt;/strong&gt; with AWS Signature Version 4. API Gateway validates the signature before Lambda even runs. It's the same security AWS Console uses.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Code: Let's Break It Down
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Lambda Handler (Simplified)
&lt;/h3&gt;

&lt;p&gt;Here's what a Lambda function looks like (simplified for clarity):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_item_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve a single item from DynamoDB by primary key.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Parse the request
&lt;/span&gt;    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;table_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Convert simple format to DynamoDB format
&lt;/span&gt;    &lt;span class="n"&gt;dynamodb_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;dynamodb_key&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;S&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
            &lt;span class="n"&gt;dynamodb_key&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="c1"&gt;# Call DynamoDB
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;TableName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dynamodb_key&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Format response as human-readable text
&lt;/span&gt;    &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Item&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="n"&gt;formatted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;statusCode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;headers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/plain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Item from table &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Three key parts:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Parse input&lt;/strong&gt; - Extract table name and key&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Convert formats&lt;/strong&gt; - Simple JSON → DynamoDB types&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Return readable text&lt;/strong&gt; - Not raw JSON&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Proxy Script (The Secret Sauce)
&lt;/h3&gt;

&lt;p&gt;The proxy does three critical things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Tool Discovery:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# On startup&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; GET https://api.execute-api.us-east-1.amazonaws.com/tools
&lt;span class="c"&gt;# Saves tool definitions locally&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. SigV4 Signing:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# For each request&lt;/span&gt;
&lt;span class="nv"&gt;signature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;calculate_aws_signature &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
curl &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: AWS4-HMAC-SHA256 Credential=..."&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
     https://api.execute-api.us-east-1.amazonaws.com/dynamodb/get-item
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. JSON-RPC Translation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Receives from Kiro:&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"jsonrpc"&lt;/span&gt;: &lt;span class="s2"&gt;"2.0"&lt;/span&gt;, &lt;span class="s2"&gt;"method"&lt;/span&gt;: &lt;span class="s2"&gt;"tools/call"&lt;/span&gt;, &lt;span class="s2"&gt;"params"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;...&lt;span class="o"&gt;}}&lt;/span&gt;

&lt;span class="c"&gt;# Translates to HTTP:&lt;/span&gt;
POST /dynamodb/get-item
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"table_name"&lt;/span&gt;: &lt;span class="s2"&gt;"Users"&lt;/span&gt;, &lt;span class="s2"&gt;"key"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"userId"&lt;/span&gt;: &lt;span class="s2"&gt;"123"&lt;/span&gt;&lt;span class="o"&gt;}}&lt;/span&gt;

&lt;span class="c"&gt;# Returns to Kiro:&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"jsonrpc"&lt;/span&gt;: &lt;span class="s2"&gt;"2.0"&lt;/span&gt;, &lt;span class="s2"&gt;"result"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"content"&lt;/span&gt;: &lt;span class="o"&gt;[{&lt;/span&gt;&lt;span class="s2"&gt;"type"&lt;/span&gt;: &lt;span class="s2"&gt;"text"&lt;/span&gt;, &lt;span class="s2"&gt;"text"&lt;/span&gt;: &lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="o"&gt;}]}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's a &lt;strong&gt;protocol adapter&lt;/strong&gt; - speaks MCP to Kiro, speaks HTTP to AWS.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Infrastructure (Terraform)
&lt;/h3&gt;

&lt;p&gt;Each Lambda gets its own Terraform file. Here's the pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# IAM Role&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"lambda_get_item_role"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dynamodb-get-item-role"&lt;/span&gt;
  &lt;span class="c1"&gt;# Trust policy allows Lambda service to assume this role&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Scoped Permission&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"lambda_get_item_dynamodb"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dynamodb-get-item-policy"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambda_get_item_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"dynamodb:GetItem"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Only this action!&lt;/span&gt;
      &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Lambda Function&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lambda_function"&lt;/span&gt; &lt;span class="s2"&gt;"lambda_get_item"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;function_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dynamodb-get-item"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambda_get_item_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
  &lt;span class="nx"&gt;runtime&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"python3.13"&lt;/span&gt;
  &lt;span class="nx"&gt;handler&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dynamodb_ops.get_item_handler"&lt;/span&gt;
  &lt;span class="c1"&gt;# ... more config&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Rinse and repeat&lt;/strong&gt; for each operation. Total: 11 Lambda functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 10 DynamoDB Operations
&lt;/h2&gt;

&lt;p&gt;Here's what you can do:&lt;/p&gt;

&lt;h3&gt;
  
  
  Read Operations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Get Item&lt;/strong&gt; - Fetch a single item by key&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Get user user001 from the Users table"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Query&lt;/strong&gt; - Find items matching a condition&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Show me all orders for user123 from the Orders table"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Scan&lt;/strong&gt; - Read the entire table (with optional filters)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Scan the Products table and show me 10 items"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Batch Get&lt;/strong&gt; - Fetch multiple items at once&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Get users user001, user002, and user003 from Users table"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;5. List Tables&lt;/strong&gt; - See all DynamoDB tables&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"What DynamoDB tables do I have?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;6. Describe Table&lt;/strong&gt; - Get table metadata&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Describe the Users table structure"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;7. Count Items&lt;/strong&gt; - Get approximate table size&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"How many items are in the Users table?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Write Operations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;8. Put Item&lt;/strong&gt; - Add or replace an item&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Add a user with userId user011, name Kate Brown to Users table"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;9. Update Item&lt;/strong&gt; - Modify specific attributes&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Update the role to Senior Engineer for user001"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;10. Delete Item&lt;/strong&gt; - Remove an item&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Delete user user005 from the Users table"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Bonus: The Sample Table
&lt;/h2&gt;

&lt;p&gt;We include an optional &lt;code&gt;sample-table.tf&lt;/code&gt; that creates a "Users" table with 10 realistic user records:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_dynamodb_table"&lt;/span&gt; &lt;span class="s2"&gt;"users_sample"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Users"&lt;/span&gt;
  &lt;span class="nx"&gt;billing_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"PAY_PER_REQUEST"&lt;/span&gt;  &lt;span class="c1"&gt;# No fixed costs!&lt;/span&gt;
  &lt;span class="nx"&gt;hash_key&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"userId"&lt;/span&gt;

  &lt;span class="c1"&gt;# ... schema definition&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_dynamodb_table_item"&lt;/span&gt; &lt;span class="s2"&gt;"user_1"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;table_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_dynamodb_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;users_sample&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;

  &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;userId&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;S&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"user001"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;S&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Alice Johnson"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;email&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;S&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"alice.johnson@example.com"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;role&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;S&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Software Engineer"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;department&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;S&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Engineering"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;active&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;BOOL&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;# ... more fields&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Perfect for testing!&lt;/strong&gt; Deploy once, start asking questions immediately.&lt;/p&gt;

&lt;p&gt;Don't need it? Just delete the file or rename it to &lt;code&gt;sample-table.tf.disabled&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Deploy This
&lt;/h2&gt;

&lt;p&gt;Ready to try it? Here's the journey:&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# You need these installed&lt;/span&gt;
aws &lt;span class="nt"&gt;--version&lt;/span&gt;          &lt;span class="c"&gt;# AWS CLI&lt;/span&gt;
terraform &lt;span class="nt"&gt;--version&lt;/span&gt;    &lt;span class="c"&gt;# Terraform&lt;/span&gt;
jq &lt;span class="nt"&gt;--version&lt;/span&gt;          &lt;span class="c"&gt;# JSON processor&lt;/span&gt;
bash &lt;span class="nt"&gt;--version&lt;/span&gt;        &lt;span class="c"&gt;# Bash 4+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure your AWS credentials are configured:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws sts get-caller-identity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1: Clone and Deploy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repo (or create from the code)&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;AWSServerlessMCP

&lt;span class="c"&gt;# Run the magic script&lt;/span&gt;
./apply.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;✅ Validates your environment&lt;/li&gt;
&lt;li&gt;✅ Deploys all 11 Lambda functions via Terraform&lt;/li&gt;
&lt;li&gt;✅ Creates API Gateway routes&lt;/li&gt;
&lt;li&gt;✅ Generates IAM user for the proxy&lt;/li&gt;
&lt;li&gt;✅ Stores credentials in Secrets Manager&lt;/li&gt;
&lt;li&gt;✅ Generates Claude Desktop config&lt;/li&gt;
&lt;li&gt;✅ Runs validation tests&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Total deployment time:&lt;/strong&gt; ~2-3 minutes&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Configure Claude Desktop
&lt;/h3&gt;

&lt;p&gt;The script generates &lt;code&gt;02-proxy/claude_desktop_config_sh.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dynamodb"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/path/to/proxy.sh"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"MCP_ACCESS_KEY_ID"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AKIA..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"MCP_SECRET_ACCESS_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"MCP_API_ENDPOINT"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://....execute-api.us-east-1.amazonaws.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"MCP_REGION"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Copy this&lt;/strong&gt; to your Claude Desktop config:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;macOS&lt;/strong&gt;: &lt;code&gt;~/Library/Application Support/Claude/claude_desktop_config.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linux&lt;/strong&gt;: &lt;code&gt;~/.config/Claude/claude_desktop_config.json&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 3: Restart Claude Desktop
&lt;/h3&gt;

&lt;p&gt;Close and reopen Claude Desktop. You should see DynamoDB tools appear!&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Start Asking Questions!
&lt;/h3&gt;

&lt;p&gt;Try these:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"List all my DynamoDB tables"

"Describe the Users table"

"Show me all users from the Users table"

"Get user user001 from Users table"

"Add a new user with userId user011, name John Doe, 
 email john@example.com to the Users table"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Security Deep Dive
&lt;/h2&gt;

&lt;p&gt;Let's talk about how we keep this secure:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. IAM Authentication
&lt;/h3&gt;

&lt;p&gt;Every request goes through this flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request → Proxy signs with AWS SigV4 → API Gateway validates signature → Lambda executes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;No signature = No access.&lt;/strong&gt; Period.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Scoped Permissions
&lt;/h3&gt;

&lt;p&gt;The proxy IAM user has exactly ONE permission:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"execute-api:Invoke"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:execute-api:us-east-1:ACCOUNT:API_ID/*/*"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It can call the API. &lt;strong&gt;Nothing else.&lt;/strong&gt; Can't create EC2 instances, can't delete S3 buckets, can't read secrets.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Lambda Isolation
&lt;/h3&gt;

&lt;p&gt;Each Lambda has scoped DynamoDB permissions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;get-item Lambda    → Can only read
put-item Lambda    → Can only write
delete-item Lambda → Can only delete
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even if you somehow bypass API Gateway (you can't), each Lambda is isolated.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Audit Trail
&lt;/h3&gt;

&lt;p&gt;Every action is logged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_audit_log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;headers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x-mcp-user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUDIT tool=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; user=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CloudWatch Logs capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who made the request (your username)&lt;/li&gt;
&lt;li&gt;What tool was called&lt;/li&gt;
&lt;li&gt;When it happened&lt;/li&gt;
&lt;li&gt;What the result was&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. No Secrets in Code
&lt;/h3&gt;

&lt;p&gt;Credentials live in AWS Secrets Manager:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws secretsmanager get-secret-value &lt;span class="nt"&gt;--secret-id&lt;/span&gt; dynamodb-mcp-proxy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Never in your codebase. Never in environment variables you might accidentally commit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Analysis
&lt;/h2&gt;

&lt;p&gt;"How much does this cost to run?"&lt;/p&gt;

&lt;p&gt;Let's break it down:&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS Free Tier (First 12 Months):
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lambda&lt;/strong&gt;: 1M requests/month free + 400,000 GB-seconds compute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway&lt;/strong&gt;: 1M API calls/month free&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DynamoDB&lt;/strong&gt;: 25 GB storage + 25 read/write units&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  After Free Tier:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Lambda&lt;/strong&gt;: $0.20 per 1M requests + $0.0000166667 per GB-second&lt;/p&gt;

&lt;p&gt;Example calculation for 10,000 queries/month:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requests: 10,000 × $0.20/1M = &lt;strong&gt;$0.002&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Compute (128MB, 200ms avg): 10,000 × 0.2s × 0.125GB × $0.0000166667 = &lt;strong&gt;$0.004&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total Lambda: ~$0.01/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;API Gateway&lt;/strong&gt;: $1.00 per 1M requests&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10,000 requests = &lt;strong&gt;$0.01/month&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;DynamoDB&lt;/strong&gt;: Pay-per-request pricing&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$1.25 per 1M write requests&lt;/li&gt;
&lt;li&gt;$0.25 per 1M read requests&lt;/li&gt;
&lt;li&gt;10,000 reads = &lt;strong&gt;$0.003/month&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Secrets Manager&lt;/strong&gt;: $0.40/month per secret&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;$0.40/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Total for 10,000 queries/month: &lt;strong&gt;~$0.42&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For a hobby project? Basically free. For production? Scales linearly with usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Patterns and Best Practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pattern 1: Query with Filters
&lt;/h3&gt;

&lt;p&gt;Instead of scanning, use query when possible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Efficient - uses partition key
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Query Orders table where userId equals user123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Less efficient - full table scan
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Scan Orders table and filter by userId user123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern 2: Batch Operations
&lt;/h3&gt;

&lt;p&gt;Fetch multiple items in one call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# One request for three items
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get users user001, user002, user003 using batch get&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Better than three separate requests
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern 3: Conditional Updates
&lt;/h3&gt;

&lt;p&gt;Use update expressions for atomic operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Update the counter by incrementing it by 1 for item user001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This translates to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;UpdateExpression&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SET #counter = #counter + :inc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Atomic, no race conditions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extending the System
&lt;/h2&gt;

&lt;p&gt;Want to add a new operation? Here's how:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Add Handler to Python
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In dynamodb_ops.py
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;batch_write_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Bulk write multiple items.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_parse_json_body&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# ... implementation
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Successfully wrote N items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add to TOOL_REGISTRY
&lt;/span&gt;&lt;span class="n"&gt;TOOL_REGISTRY&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamodb_batch_write&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write multiple items in one request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputSchema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/dynamodb/batch-write&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Create Terraform File
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# lambda-batch-write.tf&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"lambda_batch_write_role"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dynamodb-batch-write-role"&lt;/span&gt;
  &lt;span class="c1"&gt;# ... role definition&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"lambda_batch_write_dynamodb"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"dynamodb:BatchWriteItem"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lambda_function"&lt;/span&gt; &lt;span class="s2"&gt;"lambda_batch_write"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;function_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dynamodb-batch-write"&lt;/span&gt;
  &lt;span class="nx"&gt;handler&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dynamodb_ops.batch_write_handler"&lt;/span&gt;
  &lt;span class="c1"&gt;# ... function config&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Update API Gateway
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In api.tf&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_apigatewayv2_integration"&lt;/span&gt; &lt;span class="s2"&gt;"batch_write_integration"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;api_id&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_apigatewayv2_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dynamodb_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;integration_uri&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_lambda_function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambda_batch_write&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;invoke_arn&lt;/span&gt;
  &lt;span class="c1"&gt;# ... integration config&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_apigatewayv2_route"&lt;/span&gt; &lt;span class="s2"&gt;"batch_write_route"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;api_id&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_apigatewayv2_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dynamodb_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;route_key&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"POST /dynamodb/batch-write"&lt;/span&gt;
  &lt;span class="nx"&gt;target&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"integrations/${aws_apigatewayv2_integration.batch_write_integration.id}"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Deploy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./apply.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;That's it!&lt;/strong&gt; The proxy auto-discovers the new tool on next startup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;p&gt;Where does this shine?&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Data Exploration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Show me all users who joined in 2023"
"How many active subscriptions do we have?"
"What's the average age of users in the Engineering department?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Natural language beats writing DynamoDB queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Quick CRUD Operations
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Add a test user for QA testing"
"Update the status to active for order order123"
"Delete all test data with prefix test-"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No need to open AWS Console.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Database Migrations
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Scan the Users table and show me all items missing the email field"
"Update all users in the Premium tier to add a credits field with value 100"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kiro can help you identify and fix data inconsistencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Monitoring and Alerts
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"How many failed login attempts in the last hour?"
"Show me all orders with status pending older than 24 hours"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Quick operational queries without building dashboards.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Developer Productivity
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Create a sample order for testing the checkout flow"
"Copy user user001 to user001-backup"
"Show me the schema of the Products table"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Faster than clicking through the console.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;Building this taught me some valuable lessons:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Start with Security
&lt;/h3&gt;

&lt;p&gt;We didn't bolt on IAM later - it was there from day one. That made all subsequent decisions easier.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Simplicity Scales
&lt;/h3&gt;

&lt;p&gt;One Python file. Simple Terraform. No fancy frameworks. Yet it handles thousands of requests/day without breaking a sweat.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Developer Experience Matters
&lt;/h3&gt;

&lt;p&gt;The fact that you can ask questions in plain English? That's not a gimmick. It genuinely changes how you interact with your data.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Observability is Free (Almost)
&lt;/h3&gt;

&lt;p&gt;CloudWatch Logs, CloudTrail, X-Ray tracing - all built into Lambda. We didn't build a monitoring system; we just used what AWS gives us.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. The Proxy Pattern Works
&lt;/h3&gt;

&lt;p&gt;Keeping the proxy thin and stateless was the right call. All complexity lives in Lambda where we can update it independently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting Tips
&lt;/h2&gt;

&lt;p&gt;Hit a snag? Here's how to debug:&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem: Proxy won't connect
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check AWS credentials&lt;/span&gt;
aws sts get-caller-identity

&lt;span class="c"&gt;# Test API Gateway directly&lt;/span&gt;
aws lambda invoke &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--function-name&lt;/span&gt; dynamodb-list-tables &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--payload&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  /tmp/out.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Problem: Permission denied
&lt;/h3&gt;

&lt;p&gt;Check IAM user has execute-api permission:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws iam get-user-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--user-name&lt;/span&gt; dynamodb-mcp-proxy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; dynamodb-mcp-proxy-invoke
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Problem: Lambda timeout
&lt;/h3&gt;

&lt;p&gt;Increase timeout in Terraform:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lambda_function"&lt;/span&gt; &lt;span class="s2"&gt;"lambda_scan"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;timeout&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;  &lt;span class="c1"&gt;# Increase from 15 to 30 seconds&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Problem: Can't find table
&lt;/h3&gt;

&lt;p&gt;Verify table exists:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws dynamodb list-tables
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check Lambda has permission to access it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Enhancements
&lt;/h2&gt;

&lt;p&gt;Where could this go?&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Multi-Region Support
&lt;/h3&gt;

&lt;p&gt;Deploy to multiple regions, let Kiro route to the nearest one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"dynamodb_mcp_us_east"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"./modules/dynamodb-mcp"&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"dynamodb_mcp_eu_west"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"./modules/dynamodb-mcp"&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"eu-west-1"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Advanced Query Support
&lt;/h3&gt;

&lt;p&gt;Add support for complex queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find all users where age &amp;gt; 25 AND department = Engineering 
 AND active = true, sorted by joinDate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Transaction Support
&lt;/h3&gt;

&lt;p&gt;DynamoDB supports transactions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transaction_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute multiple operations atomically.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transact_write_items&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;TransactItems&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Put&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...}},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Update&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...}},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Delete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...}}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Stream Processing
&lt;/h3&gt;

&lt;p&gt;React to DynamoDB changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alert me when a new order is created&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Update the analytics table whenever a user signs up&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use DynamoDB Streams + Lambda triggers.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Cost Optimization
&lt;/h3&gt;

&lt;p&gt;Add DynamoDB reserved capacity for predictable workloads:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_dynamodb_table"&lt;/span&gt; &lt;span class="s2"&gt;"users"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;billing_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"PROVISIONED"&lt;/span&gt;
  &lt;span class="nx"&gt;read_capacity&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
  &lt;span class="nx"&gt;write_capacity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. Multi-Table Operations
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Join Users table with Orders table on userId 
 and show me total order value per user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Execute multiple queries and aggregate in Lambda.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison with Alternatives
&lt;/h2&gt;

&lt;p&gt;How does this stack up?&lt;/p&gt;

&lt;h3&gt;
  
  
  vs. Local MCP Server
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Local Server:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Lower latency&lt;/li&gt;
&lt;li&gt;✅ No AWS costs&lt;/li&gt;
&lt;li&gt;❌ Runs only on your machine&lt;/li&gt;
&lt;li&gt;❌ Need to manage runtime dependencies&lt;/li&gt;
&lt;li&gt;❌ No centralized updates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Serverless (Ours):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Works for your whole team&lt;/li&gt;
&lt;li&gt;✅ No runtime to manage&lt;/li&gt;
&lt;li&gt;✅ Built-in scaling&lt;/li&gt;
&lt;li&gt;✅ AWS-level security&lt;/li&gt;
&lt;li&gt;❌ Small latency overhead (~100-200ms)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  vs. Direct DynamoDB Access
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Direct Access (boto3):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Maximum control&lt;/li&gt;
&lt;li&gt;✅ Lowest latency&lt;/li&gt;
&lt;li&gt;❌ Requires coding for every query&lt;/li&gt;
&lt;li&gt;❌ No natural language interface&lt;/li&gt;
&lt;li&gt;❌ Harder to audit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;MCP (Ours):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Natural language queries&lt;/li&gt;
&lt;li&gt;✅ Audit trail built-in&lt;/li&gt;
&lt;li&gt;✅ Non-technical users can query&lt;/li&gt;
&lt;li&gt;❌ Limited to predefined operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  vs. AWS Data API
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS Data API:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only for Aurora Serverless&lt;/li&gt;
&lt;li&gt;HTTP-based queries&lt;/li&gt;
&lt;li&gt;SQL interface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ours:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Works with DynamoDB&lt;/li&gt;
&lt;li&gt;✅ NoSQL operations&lt;/li&gt;
&lt;li&gt;✅ Natural language interface&lt;/li&gt;
&lt;li&gt;✅ MCP integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;If you remember nothing else, remember this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP is powerful&lt;/strong&gt; - It's not hype. It genuinely changes how we interact with data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Serverless fits MCP perfectly&lt;/strong&gt; - Centralized, scalable, secure. All the things MCP needs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Security first, always&lt;/strong&gt; - IAM, scoped permissions, audit logs. Build it in from day one.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plain-text responses win&lt;/strong&gt; - Optimize for conversation, not computation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Keep it simple&lt;/strong&gt; - One Python file, clear Terraform, no magic. Simplicity scales.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The proxy pattern works&lt;/strong&gt; - Thin client, fat backend. Update independently.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Try It Yourself!
&lt;/h2&gt;

&lt;p&gt;Ready to build your own? Here's the complete source:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: [Link to your repo]&lt;/p&gt;

&lt;p&gt;Deploy in 3 commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone &lt;span class="o"&gt;[&lt;/span&gt;your-repo]
&lt;span class="nb"&gt;cd &lt;/span&gt;AWSServerlessMCP
./apply.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Questions? Hit me up in the comments! I'd love to hear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What other AWS services would you want MCP tools for?&lt;/li&gt;
&lt;li&gt;What improvements would you make?&lt;/li&gt;
&lt;li&gt;What challenges did you face deploying it?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;We started with a simple question: "Can I ask Kiro to query my database?"&lt;/p&gt;

&lt;p&gt;We ended with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ A production-ready serverless MCP backend&lt;/li&gt;
&lt;li&gt;✅ 10 DynamoDB operations as natural language tools&lt;/li&gt;
&lt;li&gt;✅ Secure, scalable, and cost-effective&lt;/li&gt;
&lt;li&gt;✅ Deployable in under 5 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is just the beginning. MCP is going to change how we build AI-powered tools. The future isn't about building smarter AI - it's about giving AI better tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What will you build with MCP?&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this helpful? Give it a ❤️ and follow for more serverless + AI content!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have questions or improvements? Drop them in the comments - I read every one!&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://spec.modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol Specification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html" rel="noopener noreferrer"&gt;AWS Lambda Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/dynamodb/" rel="noopener noreferrer"&gt;DynamoDB Developer Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html" rel="noopener noreferrer"&gt;AWS IAM Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Connect with me:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: [&lt;a href="https://github.com/yeshwanthlm" rel="noopener noreferrer"&gt;https://github.com/yeshwanthlm&lt;/a&gt;]&lt;/li&gt;
&lt;li&gt;LinkedIn: [&lt;a href="https://www.linkedin.com/in/yeshwanth-l-m/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/yeshwanth-l-m/&lt;/a&gt;]&lt;/li&gt;
&lt;li&gt;YouTube: [&lt;a href="https://www.youtube.com/@TechWithYeshwanth" rel="noopener noreferrer"&gt;https://www.youtube.com/@TechWithYeshwanth&lt;/a&gt;]&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Tags&lt;/strong&gt;: #aws #serverless #lambda #dynamodb #ai #claude #mcp #terraform #python #devops&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>ai</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Enable HTTPS with CloudFront for an S3 Static Website</title>
      <dc:creator>Esther Ninyo</dc:creator>
      <pubDate>Wed, 29 Apr 2026 20:53:27 +0000</pubDate>
      <link>https://forem.com/aws-builders/enable-https-with-cloudfront-for-an-s3-static-website-2687</link>
      <guid>https://forem.com/aws-builders/enable-https-with-cloudfront-for-an-s3-static-website-2687</guid>
      <description>&lt;p&gt;Amazon CloudFront accelerates the delivery of static and dynamic web content to end users. To read more on what CloudFront does, check the official page &lt;a href="https://aws.amazon.com/cloudfront/getting-started/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this article, we will enable HTTPS on a static website hosted on Amazon S3. &lt;br&gt;
Note: If a website is hosted on Amazon S3 static hosting, the default S3 website endpoint only supports HTTP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon account&lt;/li&gt;
&lt;li&gt;Static website already hosted on S3 (Use this &lt;a href="https://dev.to/aws-builders/hosting-a-static-website-on-amazon-s3-3fei"&gt;link &lt;/a&gt;to access my previous article on how to create a static website on Amazon S3)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Steps to follow&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Open the &lt;a href="https://aws.amazon.com/console/" rel="noopener noreferrer"&gt;AWS console&lt;/a&gt; and log in.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Search for CloudFront in the search bar and click on create distribution.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdrctow5nfru29auywthm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdrctow5nfru29auywthm.png" alt="Amazon console dashboard" width="675" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fby4i08igcchxdjlc3ylk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fby4i08igcchxdjlc3ylk.png" alt="create distribution" width="800" height="147"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The free tier is sufficient for this learning purpose.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkempgbhzatiltuuyg3rd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkempgbhzatiltuuyg3rd.png" alt="Free tier" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Name your distribution and click next.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzqzy4xkno89qnaajavgx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzqzy4xkno89qnaajavgx.png" alt="name distribution" width="800" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the static website endpoint under origin.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1q6iv2l8s3v6u0k3lhd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1q6iv2l8s3v6u0k3lhd.png" alt="static website endpoint" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Review and create distribution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fytewvgcfluud39tulkqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fytewvgcfluud39tulkqr.png" alt="create distribution" width="800" height="328"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Voila! Your CloudFront distribution is ready for use.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcjykbori29gpa7mfpjs4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcjykbori29gpa7mfpjs4.png" alt="distribution created" width="800" height="289"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can access your secured website using the CloudFront distribution domain name.&lt;/p&gt;

&lt;p&gt;You can also take this further by adding a custom domain.&lt;/p&gt;

&lt;p&gt;Thank you for reading to the end. Kindly reach out to me in the comment section if you have any questions, or on &lt;a href="https://www.linkedin.com/in/esther-ninyo/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Till next time, cheers.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloudfront</category>
      <category>s3</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Demystifying the AWS Advanced JDBC Driver: Pools, Plugins, and the Traps I Hit</title>
      <dc:creator>Kris Iyer</dc:creator>
      <pubDate>Wed, 29 Apr 2026 19:17:33 +0000</pubDate>
      <link>https://forem.com/aws-builders/demystifying-the-aws-advanced-jdbc-driver-pools-plugins-and-the-traps-i-hit-190</link>
      <guid>https://forem.com/aws-builders/demystifying-the-aws-advanced-jdbc-driver-pools-plugins-and-the-traps-i-hit-190</guid>
      <description>&lt;h1&gt;
  
  
  Demystifying the AWS Advanced JDBC Driver: Pools, Plugins, and the Traps I Hit
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Date:&lt;/strong&gt; 2026-04-29&lt;br&gt;
&lt;strong&gt;Status:&lt;/strong&gt; Published&lt;/p&gt;


&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;The AWS Advanced JDBC Driver wraps your database driver with a plugin chain that handles failover, read/write splitting, and connection monitoring. The critical gotcha: &lt;strong&gt;it can create internal connection pools separate from your application's HikariCP&lt;/strong&gt;. If you're on v2.x with the F0 profile, you're hitting a hardcoded 30-connection ceiling regardless of your external pool config. The fix: upgrade to v3.x and use &lt;code&gt;connectionPoolType=hikari&lt;/code&gt; with &lt;code&gt;cp-MaximumPoolSize&lt;/code&gt; properties, or drop profiles entirely and configure plugins manually.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key invariant:&lt;/strong&gt; &lt;code&gt;cp-MaximumPoolSize &amp;gt;= external maximumPoolSize&lt;/code&gt; to avoid the internal pool becoming your bottleneck.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick wins:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check your driver version: v3.3.0+ recommended&lt;/li&gt;
&lt;li&gt;If using F0 profile on v2.x, upgrade immediately&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;exception-override-class-name: software.amazon.jdbc.util.HikariCPSQLException&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Keep &lt;code&gt;socketTimeout=0&lt;/code&gt; and let &lt;code&gt;efm2&lt;/code&gt; handle liveness detection&lt;/li&gt;
&lt;li&gt;Mark read-only transactions with &lt;code&gt;@Transactional(readOnly=true)&lt;/code&gt; to benefit from read/write splitting&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Why I'm writing this
&lt;/h2&gt;

&lt;p&gt;I spent a few hours chasing a performance regression that had no business existing. The service had a HikariCP pool configured for 50 connections per pod. I'd checked the Spring Boot YAML. The property names were right. The values were right. The configuration was loading at startup — I'd watched Hikari log it.&lt;/p&gt;

&lt;p&gt;And yet, under load, the pool count plateaued at exactly 30. Not 50. Not 45. Thirty. Every time. Across every pod. Tomcat threads piled up behind a 10-second wait, connection creation time sat at 10,000 ms, and our p99 latency went vertical.&lt;/p&gt;

&lt;p&gt;The answer, when I found it, was about two layers below where I'd been looking — inside a hardcoded lambda in a specific version of the AWS JDBC driver. I'd been tuning the wrong pool.&lt;/p&gt;

&lt;p&gt;This post is what I wish I'd had at the start of that investigation. If you're running Spring Boot against Aurora PostgreSQL or MySQL through &lt;code&gt;software.amazon.jdbc.Driver&lt;/code&gt;, there are a handful of things about how this driver actually works that aren't obvious from the README. Get them wrong and you get slow requests, or failed failovers, or both. Let me save you the trouble.&lt;/p&gt;


&lt;h2&gt;
  
  
  What the AWS Advanced JDBC Driver actually is
&lt;/h2&gt;

&lt;p&gt;The docs call it a "wrapper," and that's literal — it's a thin &lt;code&gt;java.sql.Driver&lt;/code&gt; that sits between your app and the underlying &lt;code&gt;org.postgresql.Driver&lt;/code&gt; (or MySQL equivalent). Your URL ends up looking like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;jdbc:aws-wrapper:postgresql://&amp;lt;endpoint&amp;gt;:5432/&amp;lt;db&amp;gt;?wrapperProfileName=F0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything after &lt;code&gt;jdbc:aws-wrapper:&lt;/code&gt; is a conventional JDBC URL the wrapper passes down. What the wrapper adds is a plugin chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your application
  -&amp;gt; HikariCP (external, app-managed)
    -&amp;gt; aws-advanced-jdbc-wrapper
      -&amp;gt; [plugin 1] -&amp;gt; [plugin 2] -&amp;gt; ... -&amp;gt; [terminal plugin]
        -&amp;gt; org.postgresql.Driver
          -&amp;gt; Aurora instance (writer or reader)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each plugin intercepts JDBC calls — &lt;code&gt;getConnection&lt;/code&gt;, &lt;code&gt;prepareStatement&lt;/code&gt;, &lt;code&gt;execute&lt;/code&gt; — and can rewrite, retry, monitor, or split them. The plugins are why you're using this driver in the first place. They're what give you fast failover, read/write splitting, and enhanced failure monitoring. Everything else about driver configuration exists to serve the plugin chain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fblk37t7ard2tvpjnf3ok.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fblk37t7ard2tvpjnf3ok.png" alt="AWS JDBC Driver Architecture - layered stack showing Application → External HikariCP → Plugin Chain (readWriteSplitting, auroraConnectionTracker, failover, efm2) → Internal Pools (per Aurora instance) → PostgreSQL Driver → Aurora instances" width="800" height="715"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Configuration profiles: convenience with teeth
&lt;/h2&gt;

&lt;p&gt;The driver ships with named &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/ConfigurationPresets.md" rel="noopener noreferrer"&gt;&lt;strong&gt;configuration profiles&lt;/strong&gt;&lt;/a&gt; — presets that bundle a plugin list and a set of timeouts. The best-known is &lt;code&gt;F0&lt;/code&gt;, which you turn on with &lt;code&gt;wrapperProfileName=F0&lt;/code&gt;. &lt;code&gt;F0&lt;/code&gt; bundles "fast failover" — the recommended plugin set for Aurora.&lt;/p&gt;

&lt;p&gt;Profiles are handy because they let an app team ship one URL parameter instead of a dozen properties. They're also the single biggest source of "how is this even possible?" incidents I've seen, because &lt;strong&gt;a profile can silently set properties you can't override from outside.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The F0 gotcha: a few hours I won't get back
&lt;/h3&gt;

&lt;p&gt;Before v3.1.0, the F0 profile eagerly constructed a second, &lt;em&gt;internal&lt;/em&gt; HikariCP pool — separate from your application's — with properties baked into a lambda at profile-load time. I didn't find this in the docs. I found it by decompiling the JAR:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// From DriverConfigurationProfiles.class in aws-advanced-jdbc-wrapper-2.6.8.jar&lt;/span&gt;
&lt;span class="c1"&gt;// (I verified this via bytecode decompilation after running out of other theories)&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setMaximumPoolSize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;                       &lt;span class="c1"&gt;// HARD CEILING&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setConnectionTimeout&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;SECONDS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;   &lt;span class="c1"&gt;// 10-second wait on exhaustion&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no property you can set to override these. The external pool config is ignored by the internal pool. The &lt;code&gt;cp-&lt;/code&gt; property prefix (I'll get to it below) doesn't exist in v2.6.8 at all — the string &lt;code&gt;"cp-"&lt;/code&gt; literally doesn't appear anywhere in the JAR.&lt;/p&gt;

&lt;p&gt;Here's what was actually happening in the service at runtime:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;App borrowed a logical connection from the external HikariCP (configured max = 50).&lt;/li&gt;
&lt;li&gt;External HikariCP asked the wrapper for a physical connection.&lt;/li&gt;
&lt;li&gt;The wrapper routed through its internal HikariCP (hardcoded max = 30).&lt;/li&gt;
&lt;li&gt;Under load, the internal pool saturated at 30. Attempts 31–50 waited up to 10 seconds and then failed.&lt;/li&gt;
&lt;li&gt;From my dashboards: external &lt;code&gt;hikaricp.connections&lt;/code&gt; capped at 30, &lt;code&gt;connections.pending&lt;/code&gt; climbed to about 170, and &lt;code&gt;connections.creation.avg&lt;/code&gt; sat at 10,000 ms.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;From the outside, this looks like a pool-sizing bug. I lost a few hours to it before the pieces clicked. The fix is a driver version bump.&lt;/p&gt;

&lt;h3&gt;
  
  
  v3.x: &lt;code&gt;cp-&lt;/code&gt; properties and &lt;code&gt;connectionPoolType=hikari&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;In v3.1.0 the driver added (&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/pull/1658" rel="noopener noreferrer"&gt;PR #1658&lt;/a&gt;) a new URL parameter (documented under the &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheReadWriteSplittingPlugin.md" rel="noopener noreferrer"&gt;read/write splitting plugin's internal connection pooling section&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;?connectionPoolType=hikari
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When that's set, the internal pool is built via &lt;code&gt;HikariPooledConnectionProvider&lt;/code&gt;'s no-arg constructor, which reads properties prefixed with &lt;code&gt;cp-&lt;/code&gt; and forwards them to the internal Hikari config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;data-source-properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;cp-MaximumPoolSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;50"&lt;/span&gt;
  &lt;span class="na"&gt;cp-MinimumIdle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5"&lt;/span&gt;
  &lt;span class="na"&gt;cp-ConnectionTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30000"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The catch I hit next:&lt;/strong&gt; &lt;code&gt;cp-&lt;/code&gt; properties are silently ignored when &lt;code&gt;wrapperProfileName=F0&lt;/code&gt; is also active. The F0 preset supplies its own &lt;code&gt;HikariPoolConfigurator&lt;/code&gt; lambda that takes precedence and still hardcodes &lt;code&gt;maxPoolSize=30&lt;/code&gt;. &lt;strong&gt;F0 and &lt;code&gt;cp-MaximumPoolSize&lt;/code&gt; cannot coexist.&lt;/strong&gt; Pick one.&lt;/p&gt;

&lt;p&gt;For Aurora with read/write splitting and proper pool sizing on v3.x, I dropped the profile and assembled the plugin list by hand:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;datasource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jdbc:aws-wrapper:postgresql://${database_endpoint}:5432/${db}?connectionPoolType=hikari&amp;amp;readerHostSelectorStrategy=roundRobin&lt;/span&gt;
    &lt;span class="na"&gt;driver-class-name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;software.amazon.jdbc.Driver&lt;/span&gt;
    &lt;span class="na"&gt;hikari&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;connection-timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60000&lt;/span&gt;
      &lt;span class="na"&gt;maximum-pool-size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
      &lt;span class="na"&gt;minimum-idle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;data-source-properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;wrapperPlugins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;readWriteSplitting,auroraConnectionTracker,failover,efm2&lt;/span&gt;
        &lt;span class="na"&gt;cp-MaximumPoolSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;50"&lt;/span&gt;
        &lt;span class="na"&gt;cp-MinimumIdle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5"&lt;/span&gt;
        &lt;span class="na"&gt;cp-ConnectionTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30000"&lt;/span&gt;
        &lt;span class="na"&gt;connectTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000"&lt;/span&gt;
        &lt;span class="na"&gt;loginTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000"&lt;/span&gt;
        &lt;span class="na"&gt;socketTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0"&lt;/span&gt;
        &lt;span class="na"&gt;failureDetectionTime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;60000"&lt;/span&gt;
        &lt;span class="na"&gt;failureDetectionCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5"&lt;/span&gt;
        &lt;span class="na"&gt;failureDetectionInterval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15000"&lt;/span&gt;
        &lt;span class="na"&gt;monitoring-connectTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000"&lt;/span&gt;
        &lt;span class="na"&gt;monitoring-socketTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5000"&lt;/span&gt;
        &lt;span class="na"&gt;monitoring-loginTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000"&lt;/span&gt;
      &lt;span class="na"&gt;exception-override-class-name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;software.amazon.jdbc.util.HikariCPSQLException&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This replaces what F0 was giving me (the plugin set and timeouts) while keeping &lt;code&gt;cp-*&lt;/code&gt; effective.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use presets vs manual configuration
&lt;/h3&gt;

&lt;p&gt;This is a gap in the official docs — there's no guidance on when presets are the right choice vs when you should go manual. Having dug through the &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/wrapper/src/main/java/software/amazon/jdbc/profile/DriverConfigurationProfiles.java" rel="noopener noreferrer"&gt;source code&lt;/a&gt; and the &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/wrapper/src/main/java/software/amazon/jdbc/profile/ConfigurationProfilePresetCodes.java" rel="noopener noreferrer"&gt;preset codes&lt;/a&gt;, here's how I think about it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The preset families:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Family&lt;/th&gt;
&lt;th&gt;Pool type&lt;/th&gt;
&lt;th&gt;Presets&lt;/th&gt;
&lt;th&gt;What they're for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A / B / C&lt;/td&gt;
&lt;td&gt;No pool&lt;/td&gt;
&lt;td&gt;A0, A1, A2, B, C0, C1&lt;/td&gt;
&lt;td&gt;Failover + monitoring only. No internal connection pooling. You bring your own (external) pool or don't pool at all.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D / E / F&lt;/td&gt;
&lt;td&gt;Internal pool&lt;/td&gt;
&lt;td&gt;D0, D1, E, F0, F1&lt;/td&gt;
&lt;td&gt;Failover + monitoring + internal HikariCP pool (managed by the wrapper). &lt;code&gt;F0&lt;/code&gt; is the most commonly referenced.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;G / H / I&lt;/td&gt;
&lt;td&gt;External pool&lt;/td&gt;
&lt;td&gt;G0, G1, H, I0, I1&lt;/td&gt;
&lt;td&gt;Designed for apps that manage their own pool externally. The wrapper does not create internal pools.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SF_ prefix&lt;/td&gt;
&lt;td&gt;(matches base)&lt;/td&gt;
&lt;td&gt;SF_D0, SF_D1, SF_E, SF_F0, SF_F1&lt;/td&gt;
&lt;td&gt;Spring Framework variants — same as their base preset but with &lt;code&gt;readWriteSplitting&lt;/code&gt; disabled (Spring handles routing via separate DataSource beans).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The number suffix indicates failure-detection sensitivity: &lt;code&gt;0&lt;/code&gt; = normal, &lt;code&gt;1&lt;/code&gt; = easy/less sensitive (or aggressive, depending on the family), &lt;code&gt;2&lt;/code&gt; = aggressive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem with pool presets (D/E/F families):&lt;/strong&gt; every preset that creates an internal pool hardcodes the same HikariCP values in a lambda with no override mechanism:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Property&lt;/th&gt;
&lt;th&gt;Hardcoded value&lt;/th&gt;
&lt;th&gt;Overridable via &lt;code&gt;cp-*&lt;/code&gt;?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;maxPoolSize&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No — preset lambda takes precedence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;connectionTimeout&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10 seconds&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;minimumIdle&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;idleTimeout&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;15 minutes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;keepaliveTime&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3 minutes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;validationTimeout&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1 second&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;maxLifetime&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1 day&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;initializationFailTimeout&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;-1&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This applies to D0, D1, E, F0, F1 and their SF_ variants — &lt;strong&gt;all&lt;/strong&gt; of them hardcode &lt;code&gt;maxPoolSize=30&lt;/code&gt;. The &lt;code&gt;cp-*&lt;/code&gt; properties (like &lt;code&gt;cp-MaximumPoolSize&lt;/code&gt;) are silently ignored when any of these presets are active, because the preset's &lt;code&gt;HikariPoolConfigurator&lt;/code&gt; lambda overrides the &lt;code&gt;HikariPooledConnectionProvider&lt;/code&gt;'s property-reading path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use a preset:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're prototyping, running a small service, or don't have specific pool-sizing requirements.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;maxPoolSize=30&lt;/code&gt; and &lt;code&gt;connectionTimeout=10s&lt;/code&gt; are acceptable for your workload.&lt;/li&gt;
&lt;li&gt;You want a known-good plugin + timeout combination without thinking about individual settings.&lt;/li&gt;
&lt;li&gt;You're using a no-pool preset (A/B/C family) and bringing your own external pool — these have no hardcoded pool values to collide with.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to go manual (drop the preset):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need to control &lt;code&gt;maxPoolSize&lt;/code&gt;, &lt;code&gt;connectionTimeout&lt;/code&gt;, or any other pool property — which is most production deployments. This is what I had to do.&lt;/li&gt;
&lt;li&gt;You're running at non-trivial throughput where 30 connections per internal pool is a ceiling (this was my exact situation).&lt;/li&gt;
&lt;li&gt;You want &lt;code&gt;cp-*&lt;/code&gt; properties to actually take effect.&lt;/li&gt;
&lt;li&gt;You're combining &lt;code&gt;readWriteSplitting&lt;/code&gt; with &lt;code&gt;@Transactional(readOnly=true)&lt;/code&gt; in Spring and need internal pools with custom sizing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The manual approach means specifying &lt;code&gt;connectionPoolType=hikari&lt;/code&gt; + &lt;code&gt;wrapperPlugins=...&lt;/code&gt; + &lt;code&gt;cp-*&lt;/code&gt; properties explicitly, instead of &lt;code&gt;wrapperProfileName=F0&lt;/code&gt;. You lose the convenience of a single preset name, but you gain control over every property. For reference, the &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/ConfigurationPresets.md" rel="noopener noreferrer"&gt;Configuration Presets docs&lt;/a&gt; list what each preset bundles, so you can replicate the plugin list and timeouts manually while overriding only the pool properties you need.&lt;/p&gt;




&lt;h2&gt;
  
  
  External pooling vs internal pooling — what each layer is actually doing
&lt;/h2&gt;

&lt;p&gt;This is something most folks will need to pay attention to. These two layers are not redundant. They do different jobs.&lt;/p&gt;

&lt;h3&gt;
  
  
  External pool (my application's HikariCP, managed by Spring Boot)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scope:&lt;/strong&gt; one pool per Spring &lt;code&gt;DataSource&lt;/code&gt; bean, typically one per pod.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Holds:&lt;/strong&gt; &lt;em&gt;logical&lt;/em&gt; connections — the &lt;code&gt;java.sql.Connection&lt;/code&gt; objects my code calls &lt;code&gt;.prepareStatement&lt;/code&gt; on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gates:&lt;/strong&gt; how many &lt;em&gt;threads&lt;/em&gt; can hold a connection concurrently. If this is 50, request #51 waits or times out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maps to:&lt;/strong&gt; how many Tomcat threads can simultaneously sit inside a DB-touching request.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Internal pool (managed by the wrapper, one per Aurora instance)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scope:&lt;/strong&gt; with &lt;code&gt;readWriteSplitting&lt;/code&gt; + &lt;code&gt;connectionPoolType=hikari&lt;/code&gt;, &lt;strong&gt;one internal pool per Aurora instance&lt;/strong&gt; — a writer pool, and one pool per reader. The wrapper routes logical connections to the right instance based on read-only hints (&lt;code&gt;setReadOnly(true)&lt;/code&gt; or &lt;code&gt;@Transactional(readOnly=true)&lt;/code&gt; in Spring).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Holds:&lt;/strong&gt; &lt;em&gt;physical&lt;/em&gt; connections — TCP/TLS sessions to a specific Aurora node.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gates:&lt;/strong&gt; how many physical sockets stay open to each instance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maps to:&lt;/strong&gt; Aurora's per-instance &lt;code&gt;max_connections&lt;/code&gt;. The default formula is &lt;code&gt;LEAST({DBInstanceClassMemory/9531392}, 5000)&lt;/code&gt;, so memory-rich instances like &lt;code&gt;db.r7i.4xlarge&lt;/code&gt; (128 GiB) hit the 5,000 hard cap rather than scale further.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why both are needed — and the official caveat
&lt;/h3&gt;

&lt;p&gt;The external pool's logical connections are cheap — Java objects wrapping references into the internal pool. The internal pool's physical connections are expensive — TLS handshake, auth, wire protocol. The wrapper hands out a single logical connection from the external pool while keeping the physical session pinned to the correct instance (writer for writes, reader-N for reads).&lt;/p&gt;

&lt;p&gt;Without the internal pool layer, every &lt;code&gt;getConnection()&lt;/code&gt; from the external pool would open a fresh physical connection to some instance. That undoes HikariCP's entire point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important caveat from the AWS docs:&lt;/strong&gt; the &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheReadWriteSplittingPlugin.md" rel="noopener noreferrer"&gt;ReadWriteSplitting plugin documentation&lt;/a&gt; explicitly states:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Using internal and external pools at the same time has not been tested and may result in problematic behaviour."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The docs go further and recommend disabling external connection pools entirely when using internal pooling:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"If you want to use the driver's internal connection pooling, we recommend that you explicitly disable external connection pools (provided by Spring). You need to check the &lt;code&gt;spring.datasource.type&lt;/code&gt; property to ensure that any external connection pooling is disabled."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here's the thing that's easy to miss: &lt;strong&gt;if your Spring Boot app has &lt;code&gt;spring.datasource.hikari.*&lt;/code&gt; properties and &lt;code&gt;connectionPoolType=hikari&lt;/code&gt; in the JDBC URL, you're running double pools whether you intended to or not.&lt;/strong&gt; &lt;code&gt;connectionPoolType=hikari&lt;/code&gt; only controls the wrapper's internal pool — it doesn't replace or disable the external one. Spring Boot independently auto-detects HikariCP on the classpath and creates the external &lt;code&gt;HikariDataSource&lt;/code&gt; bean. Unless you explicitly set &lt;code&gt;spring.datasource.type=org.springframework.jdbc.datasource.SimpleDriverDataSource&lt;/code&gt;, both pools are active. This is almost certainly the configuration most Spring Boot teams end up with.&lt;/p&gt;

&lt;p&gt;In practice, I've run both pools together under sustained load without issues — but that's my workload, not a guarantee. The double-pool architecture works when you treat the external pool as a concurrency gate and the internal pools as physical-session caches, and keep &lt;code&gt;cp-MaximumPoolSize &amp;gt;= maximumPoolSize&lt;/code&gt; so the internal layer never becomes the bottleneck. But if you're hitting edge cases — connections leaking, intermittent stale-connection errors after failover, or pool metrics that don't add up — this officially-untested interaction is the first thing to suspect.&lt;/p&gt;

&lt;h3&gt;
  
  
  So how do you actually disable the external pool?
&lt;/h3&gt;

&lt;p&gt;This is the part I want to make crystal clear, because it's easy to think you've solved double-pooling when you haven't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why you're probably running double pools right now:&lt;/strong&gt; Spring Boot auto-detects HikariCP on your classpath (it's pulled in by &lt;code&gt;spring-boot-starter-data-jpa&lt;/code&gt; or &lt;code&gt;spring-boot-starter-jdbc&lt;/code&gt;) and creates a &lt;code&gt;HikariDataSource&lt;/code&gt; bean automatically. Setting &lt;code&gt;connectionPoolType=hikari&lt;/code&gt; in the wrapper URL does &lt;strong&gt;not&lt;/strong&gt; turn this off — that only tells the wrapper to create its own internal pools. These are two independent systems that don't know about each other.&lt;/p&gt;

&lt;p&gt;If your &lt;code&gt;application.yaml&lt;/code&gt; looks like this, you have two pools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# THIS IS DOUBLE-POOLING — both pools are active&lt;/span&gt;
&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;datasource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jdbc:aws-wrapper:postgresql://...?connectionPoolType=hikari&amp;amp;readerHostSelectorStrategy=roundRobin&lt;/span&gt;
    &lt;span class="na"&gt;driver-class-name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;software.amazon.jdbc.Driver&lt;/span&gt;
    &lt;span class="na"&gt;hikari&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;                          &lt;span class="c1"&gt;# ← Spring Boot sees this and creates external HikariCP&lt;/span&gt;
      &lt;span class="na"&gt;maximum-pool-size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
      &lt;span class="na"&gt;minimum-idle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;data-source-properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;cp-MaximumPoolSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;50"&lt;/span&gt;     &lt;span class="c1"&gt;# ← wrapper sees this and creates internal HikariCP&lt;/span&gt;
        &lt;span class="na"&gt;cp-MinimumIdle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;To run single-pool (internal only),&lt;/strong&gt; set &lt;code&gt;spring.datasource.type&lt;/code&gt; to a non-pooling DataSource implementation. This tells Spring Boot to skip HikariCP auto-detection. The catch: without the &lt;code&gt;hikari:&lt;/code&gt; section, there's no &lt;code&gt;data-source-properties:&lt;/code&gt; block to put your &lt;code&gt;cp-*&lt;/code&gt; and wrapper properties in. You have two options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A — pass everything as URL parameters.&lt;/strong&gt; Reliable but the URL gets long:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# SINGLE-POOL (internal only) — cp-* and plugin config in the URL&lt;/span&gt;
&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;datasource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;org.springframework.jdbc.datasource.SimpleDriverDataSource&lt;/span&gt;   &lt;span class="c1"&gt;# ← disables external HikariCP&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;-&lt;/span&gt;
      &lt;span class="s"&gt;jdbc:aws-wrapper:postgresql://${database_endpoint}:5432/${database_name}&lt;/span&gt;
      &lt;span class="s"&gt;?connectionPoolType=hikari&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;readerHostSelectorStrategy=roundRobin&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;wrapperPlugins=readWriteSplitting,auroraConnectionTracker,failover,efm2&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;cp-MaximumPoolSize=50&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;cp-MinimumIdle=5&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;cp-ConnectionTimeout=30000&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;connectTimeout=10000&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;loginTimeout=10000&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;socketTimeout=0&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;failureDetectionTime=60000&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;failureDetectionCount=5&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;failureDetectionInterval=15000&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;monitoring-connectTimeout=10000&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;monitoring-socketTimeout=5000&lt;/span&gt;
      &lt;span class="s"&gt;&amp;amp;monitoring-loginTimeout=10000&lt;/span&gt;
    &lt;span class="na"&gt;driver-class-name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;software.amazon.jdbc.Driver&lt;/span&gt;
    &lt;span class="c1"&gt;# No hikari: section — Spring won't create an external pool&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Option B — use the wrapper's own DataSource class.&lt;/strong&gt; The wrapper provides &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/DataSource.md" rel="noopener noreferrer"&gt;&lt;code&gt;AwsWrapperDataSource&lt;/code&gt;&lt;/a&gt; which accepts properties directly, keeping the YAML clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# SINGLE-POOL (internal only) — using AwsWrapperDataSource&lt;/span&gt;
&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;datasource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;software.amazon.jdbc.ds.AwsWrapperDataSource&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jdbc:postgresql://${database_endpoint}:5432/${database_name}&lt;/span&gt;   &lt;span class="c1"&gt;# ← note: no aws-wrapper: prefix&lt;/span&gt;
    &lt;span class="na"&gt;driver-class-name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;org.postgresql.Driver&lt;/span&gt;                            &lt;span class="c1"&gt;# ← the underlying driver, not the wrapper&lt;/span&gt;
    &lt;span class="na"&gt;connection-properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;wrapperPlugins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;readWriteSplitting,auroraConnectionTracker,failover,efm2&lt;/span&gt;
      &lt;span class="na"&gt;connectionPoolType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hikari&lt;/span&gt;
      &lt;span class="na"&gt;readerHostSelectorStrategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;roundRobin&lt;/span&gt;
      &lt;span class="na"&gt;cp-MaximumPoolSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;50"&lt;/span&gt;
      &lt;span class="na"&gt;cp-MinimumIdle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5"&lt;/span&gt;
      &lt;span class="na"&gt;cp-ConnectionTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30000"&lt;/span&gt;
      &lt;span class="na"&gt;connectTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000"&lt;/span&gt;
      &lt;span class="na"&gt;loginTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000"&lt;/span&gt;
      &lt;span class="na"&gt;socketTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0"&lt;/span&gt;
      &lt;span class="na"&gt;failureDetectionTime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;60000"&lt;/span&gt;
      &lt;span class="na"&gt;failureDetectionCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5"&lt;/span&gt;
      &lt;span class="na"&gt;failureDetectionInterval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15000"&lt;/span&gt;
      &lt;span class="na"&gt;monitoring-connectTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000"&lt;/span&gt;
      &lt;span class="na"&gt;monitoring-socketTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5000"&lt;/span&gt;
      &lt;span class="na"&gt;monitoring-loginTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the differences with &lt;code&gt;AwsWrapperDataSource&lt;/code&gt;: the URL drops the &lt;code&gt;jdbc:aws-wrapper:&lt;/code&gt; prefix (it's a plain &lt;code&gt;jdbc:postgresql:&lt;/code&gt; URL since the wrapper IS the DataSource), and &lt;code&gt;driver-class-name&lt;/code&gt; points to the underlying driver, not the wrapper. See the &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/DataSource.md" rel="noopener noreferrer"&gt;DataSource configuration docs&lt;/a&gt; for details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To run single-pool (external only),&lt;/strong&gt; remove &lt;code&gt;connectionPoolType=hikari&lt;/code&gt; from the URL. The wrapper won't create internal pools, and every &lt;code&gt;getConnection()&lt;/code&gt; from the external HikariCP opens a physical connection through the wrapper on-demand:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# SINGLE-POOL — only the external HikariCP is active&lt;/span&gt;
&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;datasource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jdbc:aws-wrapper:postgresql://...?readerHostSelectorStrategy=roundRobin&lt;/span&gt;
    &lt;span class="na"&gt;driver-class-name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;software.amazon.jdbc.Driver&lt;/span&gt;
    &lt;span class="na"&gt;hikari&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;maximum-pool-size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
      &lt;span class="na"&gt;minimum-idle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="c1"&gt;# No cp-* properties needed — no internal pool exists&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Trade-offs at a glance
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;External pool&lt;/th&gt;
&lt;th&gt;Internal pool&lt;/th&gt;
&lt;th&gt;What you get&lt;/th&gt;
&lt;th&gt;What you lose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Double pool&lt;/strong&gt; (most Spring Boot apps)&lt;/td&gt;
&lt;td&gt;Spring HikariCP (&lt;code&gt;hikari:&lt;/code&gt; section)&lt;/td&gt;
&lt;td&gt;Wrapper HikariCP (&lt;code&gt;connectionPoolType=hikari&lt;/code&gt; + &lt;code&gt;cp-*&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Full Spring metrics, health checks, familiar config surface. Physical connections cached per Aurora instance.&lt;/td&gt;
&lt;td&gt;Running an officially-untested combination. Two pools to reason about. Higher DB connection count than expected.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Internal only via &lt;code&gt;SimpleDriverDataSource&lt;/code&gt;&lt;/strong&gt; (&lt;code&gt;spring.datasource.type=SimpleDriverDataSource&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Disabled&lt;/td&gt;
&lt;td&gt;Wrapper HikariCP&lt;/td&gt;
&lt;td&gt;The configuration AWS actually tests against. Clean single-pool model.&lt;/td&gt;
&lt;td&gt;No &lt;code&gt;hikaricp.*&lt;/code&gt; Micrometer metrics from Spring. No HikariCP health indicator in &lt;code&gt;/actuator/health&lt;/code&gt;. &lt;code&gt;cp-*&lt;/code&gt; properties must go in the URL — gets unwieldy with many parameters.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Internal only via &lt;code&gt;AwsWrapperDataSource&lt;/code&gt;&lt;/strong&gt; (&lt;code&gt;spring.datasource.type=software.amazon.jdbc.ds.AwsWrapperDataSource&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Disabled&lt;/td&gt;
&lt;td&gt;Wrapper HikariCP&lt;/td&gt;
&lt;td&gt;AWS-tested single-pool model. Clean YAML via &lt;code&gt;connection-properties&lt;/code&gt; block — no URL stuffing.&lt;/td&gt;
&lt;td&gt;Same observability trade-offs as &lt;code&gt;SimpleDriverDataSource&lt;/code&gt; (no Spring Hikari metrics/health). Different URL format (&lt;code&gt;jdbc:postgresql:&lt;/code&gt; not &lt;code&gt;jdbc:aws-wrapper:postgresql:&lt;/code&gt;) and &lt;code&gt;driver-class-name&lt;/code&gt; points to the underlying driver. See &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/DataSource.md" rel="noopener noreferrer"&gt;DataSource docs&lt;/a&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;External only&lt;/strong&gt; (no &lt;code&gt;connectionPoolType&lt;/code&gt; in URL)&lt;/td&gt;
&lt;td&gt;Spring HikariCP&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Familiar Spring config. Full metrics.&lt;/td&gt;
&lt;td&gt;No per-instance physical connection caching. &lt;code&gt;@Transactional(readOnly=true)&lt;/code&gt; with &lt;code&gt;readWriteSplitting&lt;/code&gt; triggers a full connection switch per call (see Spring Boot limitation below).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Where I am with this
&lt;/h3&gt;

&lt;p&gt;I've been experimenting with the double-pool setup and so far it's been working without problems under sustained load across multiple pods. The external pool gives you the Micrometer metrics that make diagnosing issues possible — the &lt;code&gt;hikaricp.connections.pending&lt;/code&gt; signal is how I caught the F0 ceiling issue — and the internal pool gives you efficient physical-connection reuse across reader/writer instances. The key invariant is &lt;code&gt;cp-MaximumPoolSize &amp;gt;= maximumPoolSize&lt;/code&gt; so the internal layer never becomes the bottleneck.&lt;/p&gt;

&lt;p&gt;The one tangible downside I've observed: &lt;strong&gt;you use more database connections than you'd expect.&lt;/strong&gt; The external pool holds logical connections while the internal pools independently hold physical connections per Aurora instance. In practice the connection count on Aurora ends up higher than what the external pool size alone would suggest, because the internal pools maintain their own minimum-idle and maximum-size independently. For a fleet of pods, this adds up — make sure your Aurora instance's &lt;code&gt;max_connections&lt;/code&gt; has headroom for &lt;code&gt;pods × cp-MaximumPoolSize × (1 + number_of_readers)&lt;/code&gt;, not just &lt;code&gt;pods × maximumPoolSize&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you're hitting unexplained edge cases — connections leaking, intermittent stale-connection errors after failover, or pool metrics that don't add up — the officially-untested double-pool interaction is the first thing to suspect. Switching to internal-only (&lt;code&gt;spring.datasource.type=SimpleDriverDataSource&lt;/code&gt; or &lt;code&gt;AwsWrapperDataSource&lt;/code&gt;) is the cleanest way to eliminate it as a variable.&lt;/p&gt;

&lt;p&gt;It's also worth noting that &lt;strong&gt;you can use the wrapper without HikariCP entirely&lt;/strong&gt; — the internal pool with &lt;code&gt;connectionPoolType=hikari&lt;/code&gt; is a self-contained HikariCP instance managed by the wrapper. If you're building a non-Spring app or a lightweight service, running only the internal pool is the cleaner architecture and avoids the double-pool question altogether.&lt;/p&gt;

&lt;h3&gt;
  
  
  F0 vs SF_F0: should Spring Boot apps use &lt;code&gt;readWriteSplitting&lt;/code&gt;?
&lt;/h3&gt;

&lt;p&gt;This is one of the more confusing areas in the docs, and it matters because it determines your entire read/write routing architecture.&lt;/p&gt;

&lt;p&gt;From the &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/wrapper/src/main/java/software/amazon/jdbc/profile/DriverConfigurationProfiles.java" rel="noopener noreferrer"&gt;source code&lt;/a&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Preset&lt;/th&gt;
&lt;th&gt;Plugins&lt;/th&gt;
&lt;th&gt;Internal pool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;F0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;auroraInitialConnectionStrategy&lt;/code&gt;, &lt;code&gt;auroraConnectionTracker&lt;/code&gt;, &lt;strong&gt;&lt;code&gt;readWriteSplitting&lt;/code&gt;&lt;/strong&gt;, &lt;code&gt;failover&lt;/code&gt;, &lt;code&gt;efm2&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Yes (maxPoolSize=30)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SF_F0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;auroraInitialConnectionStrategy&lt;/code&gt;, &lt;code&gt;auroraConnectionTracker&lt;/code&gt;, &lt;code&gt;failover&lt;/code&gt;, &lt;code&gt;efm2&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Yes (maxPoolSize=30)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The only difference: SF_F0 drops &lt;code&gt;readWriteSplitting&lt;/code&gt;. Both have the same internal pool. The &lt;code&gt;SF_&lt;/code&gt; prefix stands for "Spring Framework" — these variants are meant for Spring apps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why does the Spring variant disable read/write splitting?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheReadWriteSplittingPlugin.md#limitations-when-using-spring-bootframework" rel="noopener noreferrer"&gt;Spring Boot limitations section&lt;/a&gt; of the ReadWriteSplitting plugin docs explains:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The use of read/write splitting with the annotation &lt;code&gt;@Transactional(readOnly=true)&lt;/code&gt; is **only&lt;/em&gt;* recommended for configurations using an internal connection pool.*&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When Spring encounters &lt;code&gt;@Transactional(readOnly=true)&lt;/code&gt;, it calls &lt;code&gt;conn.setReadOnly(true)&lt;/code&gt; before the method and &lt;code&gt;conn.setReadOnly(false)&lt;/code&gt; after. The &lt;code&gt;readWriteSplitting&lt;/code&gt; plugin responds by switching from writer→reader→writer on every annotated method call. Without an internal pool, each switch is a full TCP/TLS reconnect — the docs call this "substantial performance degradation." The SF_ presets sidestep this by disabling the plugin entirely and recommending &lt;strong&gt;two separate Spring DataSource beans&lt;/strong&gt; instead (one for the writer cluster endpoint, one for the reader endpoint), letting Spring handle routing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The contradiction:&lt;/strong&gt; SF_F0 &lt;em&gt;has&lt;/em&gt; internal pools — exactly the prerequisite the docs say makes &lt;code&gt;readWriteSplitting&lt;/code&gt; safe. With internal pools, the &lt;code&gt;setReadOnly&lt;/code&gt; toggle reuses cached physical connections from the per-instance pools (writer pool, reader pool), making the switch a cheap object swap rather than a TCP reconnect. So SF_F0 disables a plugin that should work fine with the internal pools it already provides.&lt;/p&gt;

&lt;p&gt;My read: the SF_ presets were likely created before &lt;code&gt;connectionPoolType=hikari&lt;/code&gt; made the internal-pool + readWriteSplitting combination clean and testable. The docs haven't fully reconciled this — they warn about the overhead, correctly note that internal pools mitigate it, but then the SF_ presets still disable it out of caution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three paths for Spring Boot read/write splitting:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;readWriteSplitting plugin&lt;/th&gt;
&lt;th&gt;How reads route to readers&lt;/th&gt;
&lt;th&gt;Trade-off&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Plugin with internal pools&lt;/strong&gt; (what we use)&lt;/td&gt;
&lt;td&gt;Enabled&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@Transactional(readOnly=true)&lt;/code&gt; triggers &lt;code&gt;setReadOnly(true)&lt;/code&gt; → plugin routes to reader via cached internal pool&lt;/td&gt;
&lt;td&gt;Single DataSource bean. Clean. Requires internal pools for acceptable switching overhead.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Two DataSource beans&lt;/strong&gt; (what SF_ presets assume)&lt;/td&gt;
&lt;td&gt;Disabled&lt;/td&gt;
&lt;td&gt;Spring's &lt;code&gt;AbstractRoutingDataSource&lt;/code&gt; or &lt;code&gt;@Qualifier&lt;/code&gt; annotations route to a writer or reader DataSource at the service layer&lt;/td&gt;
&lt;td&gt;No plugin overhead. More application-level wiring. Each DataSource can independently use the wrapper for failover/monitoring.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Plugin without internal pools&lt;/strong&gt; (don't do this)&lt;/td&gt;
&lt;td&gt;Enabled&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;setReadOnly&lt;/code&gt; triggers a full physical connection switch per call&lt;/td&gt;
&lt;td&gt;Substantial overhead. The docs explicitly warn against this.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you're already on manual config with &lt;code&gt;connectionPoolType=hikari&lt;/code&gt; and &lt;code&gt;cp-*&lt;/code&gt; properties (which you need anyway for pool sizing), enabling &lt;code&gt;readWriteSplitting&lt;/code&gt; works — the internal pools handle the switching cost. If you prefer the two-DataSource approach, use a no-readWriteSplitting configuration (like SF_F0's plugin list, but with manual pool sizing since the preset hardcodes maxPoolSize=30).&lt;/p&gt;

&lt;p&gt;Either way, don't mix the two: having &lt;code&gt;readWriteSplitting&lt;/code&gt; enabled while &lt;em&gt;also&lt;/em&gt; routing via separate DataSources would result in double routing logic that's hard to reason about.&lt;/p&gt;

&lt;h3&gt;
  
  
  HikariCP and virtual threads: a known compatibility issue
&lt;/h3&gt;

&lt;p&gt;If you're running on JDK 21+ and considering Spring Boot's &lt;code&gt;spring.threads.virtual.enabled=true&lt;/code&gt;, there is an &lt;a href="https://github.com/brettwooldridge/HikariCP/issues/2398" rel="noopener noreferrer"&gt;open HikariCP bug (#2398)&lt;/a&gt; to be aware of. The issue is filed against HikariCP 7.0.2: the &lt;code&gt;ConcurrentBag.requite()&lt;/code&gt; method uses a yield-spin loop (&lt;code&gt;Thread.yield()&lt;/code&gt; 255 times for every &lt;code&gt;parkNanos&lt;/code&gt;) that saturates all carrier threads under virtual-thread load. The result is CPU throttling at the pod level and potential liveness-probe failures — the exact kind of silent performance regression that's hard to diagnose without knowing about this issue.&lt;/p&gt;

&lt;p&gt;As of this writing, the &lt;a href="https://github.com/brettwooldridge/HikariCP/pull/2399" rel="noopener noreferrer"&gt;proposed fix in PR #2399&lt;/a&gt; has not been merged. Spring Boot 3.5.7's BOM pins HikariCP 6.3.3 by default rather than 7.x, and the bug report doesn't reproduce against the 6.x line — so check your effective HikariCP version before assuming you're affected. The workaround if you are is to disable virtual threads (&lt;code&gt;-Dspring.threads.virtual.enabled=false&lt;/code&gt;). If you're running the AWS JDBC wrapper with HikariCP as your external pool &lt;em&gt;and&lt;/em&gt; enabling virtual threads on a 7.x version, this is the interaction to watch — it's not a wrapper bug, but it surfaces at the same layer (connection pool) and looks similar in dashboards to the internal-pool ceiling problem I described earlier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sizing rule
&lt;/h3&gt;

&lt;p&gt;For &lt;code&gt;P&lt;/code&gt; pods, external pool size &lt;code&gt;E&lt;/code&gt;, and &lt;code&gt;R&lt;/code&gt; readers in the Aurora cluster, the physical connection footprint is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Writer instance:   up to P * cp-MaximumPoolSize physical connections
Per reader:        up to P * cp-MaximumPoolSize physical connections
Total:             P * cp-MaximumPoolSize * (1 + R)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;cp-MaximumPoolSize&lt;/code&gt; is the bottleneck, logical &lt;code&gt;getConnection()&lt;/code&gt; calls sit in the internal pool's wait queue — which is exactly the v2.6.8 failure mode, just on a newer version where you technically &lt;em&gt;can&lt;/em&gt; fix it. The invariant to hold: &lt;strong&gt;&lt;code&gt;cp-MaximumPoolSize &amp;gt;= external pool size&lt;/code&gt;&lt;/strong&gt; so the internal layer never becomes the bottleneck. Going higher is fine as long as the total stays under Aurora's &lt;code&gt;max_connections&lt;/code&gt; per instance with ~20% headroom.&lt;/p&gt;

&lt;h3&gt;
  
  
  Life of a single SELECT
&lt;/h3&gt;

&lt;p&gt;When I was first onboarding someone to this, the thing that actually landed was walking through one request end-to-end:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Tomcat thread calls &lt;code&gt;userRepository.findById(42)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Spring Data borrows a logical connection from external HikariCP (external pool count goes up by 1).&lt;/li&gt;
&lt;li&gt;Transaction manager begins a tx. Say it's &lt;code&gt;@Transactional(readOnly=true)&lt;/code&gt; — the read-only hint is set on the logical connection.&lt;/li&gt;
&lt;li&gt;First real statement flows through the plugin chain. &lt;code&gt;readWriteSplitting&lt;/code&gt; sees the read-only flag, picks reader-1 (round-robin), and routes to reader-1's internal pool.&lt;/li&gt;
&lt;li&gt;Reader-1's internal pool hands over a physical session; the wrapper binds it to the logical connection for the rest of the tx.&lt;/li&gt;
&lt;li&gt;Query executes on reader-1.&lt;/li&gt;
&lt;li&gt;Tx commits. Physical session returns to reader-1's internal pool; logical connection returns to external Hikari.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsmg0lhfv0ziyk1gxx35.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftsmg0lhfv0ziyk1gxx35.png" alt="Life of a SELECT - sequence diagram showing request flow through external pool, plugin chain, and internal pools" width="800" height="597"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The plugin catalog, and when I use which
&lt;/h2&gt;

&lt;p&gt;Plugins are a comma-separated list on &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/UsingTheJdbcDriver.md" rel="noopener noreferrer"&gt;&lt;code&gt;wrapperPlugins&lt;/code&gt;&lt;/a&gt;. &lt;strong&gt;Order matters.&lt;/strong&gt; The driver applies them outside-in.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I always run for Aurora
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheFailoverPlugin.md" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;failover&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; — detects Aurora writer/reader failover events via topology awareness, invalidates broken connections, reroutes to the current writer. Without this, a writer failover leaves the driver holding a dead TCP session until OS-level timeouts fire (minutes). (There's also a newer &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheFailover2Plugin.md" rel="noopener noreferrer"&gt;&lt;code&gt;failover2&lt;/code&gt;&lt;/a&gt; plugin worth evaluating for new deployments.)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheAuroraConnectionTrackerPlugin.md" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;auroraConnectionTracker&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; — maintains the map of live connections per instance. &lt;code&gt;failover&lt;/code&gt; needs it to know which connections to invalidate.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheHostMonitoringPlugin.md" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;efm2&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; — Enhanced Failure Monitor v2. A background thread per connection probes the socket at &lt;code&gt;failureDetectionInterval&lt;/code&gt;; if &lt;code&gt;failureDetectionCount&lt;/code&gt; consecutive probes fail within &lt;code&gt;failureDetectionTime&lt;/code&gt;, the connection is marked bad and &lt;code&gt;failover&lt;/code&gt; kicks in. v2 is current; v1 / &lt;code&gt;efm&lt;/code&gt; is deprecated and should not be used in new configs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What I enable conditionally
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheReadWriteSplittingPlugin.md" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;code&gt;readWriteSplitting&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; — routes read-only transactions to readers, writes to the writer. Enable when you have one or more readers &lt;em&gt;and&lt;/em&gt; your code marks read transactions properly (&lt;code&gt;@Transactional(readOnly=true)&lt;/code&gt;). Without the hint, the plugin sends everything to the writer and you get no benefit. I've seen more than one team enable it and then wonder why their readers sit idle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;iamAuth&lt;/code&gt;&lt;/strong&gt; — IAM-based auth instead of password. Enable if you're doing IAM to Aurora; otherwise skip.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;awsSecretsManager&lt;/code&gt;&lt;/strong&gt; — pulls creds from Secrets Manager at connection time. Overlaps with external secret-rotation workflows; I enable only if I'm not rotating through Kubernetes secrets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;federatedAuth&lt;/code&gt;&lt;/strong&gt; / &lt;strong&gt;&lt;code&gt;okta&lt;/code&gt;&lt;/strong&gt; — SSO-style auth; niche in my experience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;dev&lt;/code&gt;&lt;/strong&gt; / &lt;strong&gt;&lt;code&gt;logQueryPlansWhenNeeded&lt;/code&gt;&lt;/strong&gt; — debugging only, never prod.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  My default stack for Aurora PG + HikariCP
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;wrapperPlugins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;readWriteSplitting,auroraConnectionTracker,failover,efm2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I put &lt;code&gt;readWriteSplitting&lt;/code&gt; first so routing happens before failover/topology logic — that way failover can reroute a connection to the "current" writer regardless of who it was bound to. &lt;code&gt;efm2&lt;/code&gt; is last because it's terminal: it wraps the underlying connection with monitoring.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Felweenwws9jnpz1q4gvp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Felweenwws9jnpz1q4gvp.jpg" alt="Plugin chain pipeline showing getConnection() flowing through readWriteSplitting → auroraConnectionTracker → failover → efm2 → PostgreSQL driver, with each plugin's responsibility and failover event handling" width="800" height="1986"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Aurora with multiple readers: the configuration I'm shipping
&lt;/h2&gt;

&lt;p&gt;This is what I'm running now against a 1 writer + 2 reader Aurora cluster. It's not the only sensible config, but I've run it in anger through a few load tests and it's the one I trust.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jdbc:aws-wrapper:postgresql://${endpoint}:5432/${db}?connectionPoolType=hikari&amp;amp;readerHostSelectorStrategy=roundRobin&lt;/span&gt;
&lt;span class="na"&gt;hikari&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;connection-timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60000&lt;/span&gt;
  &lt;span class="na"&gt;maximum-pool-size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
  &lt;span class="na"&gt;minimum-idle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;data-source-properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;wrapperPlugins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;readWriteSplitting,auroraConnectionTracker,failover,efm2&lt;/span&gt;
    &lt;span class="na"&gt;cp-MaximumPoolSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;50"&lt;/span&gt;
    &lt;span class="na"&gt;cp-MinimumIdle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5"&lt;/span&gt;
    &lt;span class="na"&gt;cp-ConnectionTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30000"&lt;/span&gt;
    &lt;span class="c1"&gt;# I let efm2 handle liveness. TCP timeout is intentionally 0.&lt;/span&gt;
    &lt;span class="na"&gt;connectTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000"&lt;/span&gt;
    &lt;span class="na"&gt;loginTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000"&lt;/span&gt;
    &lt;span class="na"&gt;socketTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0"&lt;/span&gt;
    &lt;span class="c1"&gt;# efm2 tuning — see "failover budget" below&lt;/span&gt;
    &lt;span class="na"&gt;failureDetectionTime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;60000"&lt;/span&gt;        &lt;span class="c1"&gt;# grace period before monitoring starts&lt;/span&gt;
    &lt;span class="na"&gt;failureDetectionInterval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15000"&lt;/span&gt;    &lt;span class="c1"&gt;# 15s between probes&lt;/span&gt;
    &lt;span class="na"&gt;failureDetectionCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5"&lt;/span&gt;           &lt;span class="c1"&gt;# 5 failed probes = dead&lt;/span&gt;
    &lt;span class="na"&gt;monitoring-connectTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000"&lt;/span&gt;
    &lt;span class="na"&gt;monitoring-socketTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5000"&lt;/span&gt;
    &lt;span class="na"&gt;monitoring-loginTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000"&lt;/span&gt;
  &lt;span class="na"&gt;exception-override-class-name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;software.amazon.jdbc.util.HikariCPSQLException&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Reader host selection
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/HostSelectionStrategies.md" rel="noopener noreferrer"&gt;&lt;code&gt;readerHostSelectorStrategy&lt;/code&gt;&lt;/a&gt; controls how &lt;code&gt;readWriteSplitting&lt;/code&gt; picks a reader:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;roundRobin&lt;/code&gt;&lt;/strong&gt; — distributes reads evenly. My default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;random&lt;/code&gt;&lt;/strong&gt; — statistically even but variable in any given second.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;leastConnections&lt;/code&gt;&lt;/strong&gt; — picks the reader with the fewest active physical connections. Worth it when readers have meaningfully different workloads, but adds a small lookup cost per acquisition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;fastestResponse&lt;/code&gt;&lt;/strong&gt; — picks the reader with the lowest observed response latency. Useful when readers have asymmetric hardware or load.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a homogeneous reader fleet, &lt;code&gt;roundRobin&lt;/code&gt; is the cleanest and cheapest. I've only ever needed &lt;code&gt;leastConnections&lt;/code&gt; once, for an asymmetric deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  The exception-translation line I almost missed
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;exception-override-class-name: software.amazon.jdbc.util.HikariCPSQLException&lt;/code&gt; is easy to skip over (see the &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/examples/SpringBootHikariExample/README.md" rel="noopener noreferrer"&gt;Spring Boot + HikariCP example&lt;/a&gt; where it's buried at the bottom of the YAML). Without it, HikariCP sees failover-triggered &lt;code&gt;SQLException&lt;/code&gt;s as "normal" and tries to hand out connections the wrapper has already invalidated. Pool stays confused, latency stays bad, and the ordinary failover recovery path never fully completes. &lt;strong&gt;Not optional&lt;/strong&gt; if you're on HikariCP + &lt;code&gt;failover&lt;/code&gt;. Set it once and never think about it again.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance aspects
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Where time actually goes
&lt;/h3&gt;

&lt;p&gt;Under steady load, the wrapper's overhead breaks down into three categories:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Plugin chain traversal&lt;/strong&gt; — every JDBC call walks through the chain. For N plugins and M statements per transaction, you pay N×M method-dispatch overhead. On v3.x it's low single-digit microseconds — not zero, but invisible unless you're chasing the last 1% of p99. The rule I follow: don't enable plugins you aren't using.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Physical connection creation&lt;/strong&gt; — TLS handshake + auth + wire setup. One-time per internal pool slot; amortized, it's invisible &lt;em&gt;unless&lt;/em&gt; the pool is cold or under-sized and the driver is creating sessions continuously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring traffic&lt;/strong&gt; — &lt;code&gt;efm2&lt;/code&gt; sends lightweight probes per connection. At &lt;code&gt;failureDetectionInterval=15000&lt;/code&gt; the volume is tiny.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Metrics I always watch
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;What it tells me&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;hikaricp.connections&lt;/code&gt; (total)&lt;/td&gt;
&lt;td&gt;External pool size. Should grow to &lt;code&gt;maximumPoolSize&lt;/code&gt; under load. If it plateaus below the configured max, I'm hitting the internal pool ceiling — that's exactly how I finally caught the v2.6.8 F0 issue.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;hikaricp.connections.active&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Currently in-use logical connections. Near the max = contention.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;hikaricp.connections.pending&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Threads waiting to borrow. &lt;strong&gt;Steady-state non-zero = bottleneck.&lt;/strong&gt; I alert on this.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;hikaricp.connections.creation&lt;/code&gt; (ms)&lt;/td&gt;
&lt;td&gt;Time to acquire a physical connection through the wrapper. Single-digit ms is normal; 10,000 ms means an internal-pool wait timed out. This is the specific signal that said "the problem isn't the external pool."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;hikaricp.connections.timeout&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Borrow timeouts. Always zero when healthy.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aurora &lt;code&gt;DatabaseConnections&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Physical conns per instance. Should roughly equal &lt;code&gt;sum over pods of (active internal-pool conns to this role)&lt;/code&gt;. Cross-reference with &lt;code&gt;cp-MaximumPoolSize&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aurora &lt;code&gt;Deadlocks&lt;/code&gt;, &lt;code&gt;CommitLatency&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Independent of the driver but often regress together if pool sizing forces serialization at the app layer.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  My sizing calculator
&lt;/h3&gt;

&lt;p&gt;For &lt;code&gt;P&lt;/code&gt; pods, &lt;code&gt;E&lt;/code&gt; external pool size, &lt;code&gt;R_n&lt;/code&gt; reader count, target Aurora &lt;code&gt;M&lt;/code&gt; max_connections per instance with 20% headroom:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;cp-MaximumPoolSize&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;E                                     # invariant; no internal-pool wait&lt;/span&gt;
&lt;span class="err"&gt;Writer&lt;/span&gt; &lt;span class="err"&gt;physical&lt;/span&gt; &lt;span class="err"&gt;at&lt;/span&gt; &lt;span class="py"&gt;peak&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;P * cp-MaximumPoolSize&lt;/span&gt;
&lt;span class="err"&gt;Per-reader&lt;/span&gt; &lt;span class="err"&gt;physical&lt;/span&gt; &lt;span class="err"&gt;at&lt;/span&gt; &lt;span class="py"&gt;peak&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;P * cp-MaximumPoolSize       (round-robin balances across readers)&lt;/span&gt;
&lt;span class="py"&gt;Sanity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;P * cp-MaximumPoolSize &amp;lt;= 0.8 * M&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plug in your own numbers: &lt;code&gt;P * cp-MaximumPoolSize&lt;/code&gt; per role. Check this against the &lt;code&gt;max_connections&lt;/code&gt; for your Aurora instance class and leave ~20% headroom for maintenance connections and other clients.&lt;/p&gt;




&lt;h2&gt;
  
  
  Failover — what happens under the hood
&lt;/h2&gt;

&lt;p&gt;Aurora failover — writer restart, reader promotion, or AZ failover — is the specific scenario the wrapper's plugins were built to survive. The first time I watched a failover in production with this stack, I actually wanted to know what was happening step by step. Here's what I worked out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sequence during a writer failover
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Writer instance goes unresponsive. TCP sockets from my pods to that writer stop returning packets.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;efm2&lt;/code&gt;'s monitor thread hits &lt;code&gt;failureDetectionCount&lt;/code&gt; consecutive probe failures within &lt;code&gt;failureDetectionTime&lt;/code&gt;. The underlying connection is marked bad.&lt;/li&gt;
&lt;li&gt;My app's next statement on that connection throws a &lt;code&gt;SQLException&lt;/code&gt; tagged with a failover-relevant SQLState.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;failover&lt;/code&gt; catches it, queries Aurora topology (via the RDS DNS or the cluster's topology endpoint), identifies the new writer, and reconnects transparently.&lt;/li&gt;
&lt;li&gt;If configured (&lt;code&gt;failoverMode=reader-or-writer&lt;/code&gt;), the reconnect can fall back to a reader for the brief window where no writer is available. Default is writer.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;auroraConnectionTracker&lt;/code&gt; walks its table of open connections to the dead instance and invalidates them.&lt;/li&gt;
&lt;li&gt;External HikariCP sees the invalidation through &lt;code&gt;HikariCPSQLException&lt;/code&gt; (this is the moment &lt;code&gt;exception-override-class-name&lt;/code&gt; matters) and evicts the bad logical connections.&lt;/li&gt;
&lt;li&gt;New logical connections open against fresh internal-pool slots bound to the new writer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;End-to-end with default timers: &lt;strong&gt;detection ~75 seconds&lt;/strong&gt; (&lt;code&gt;failureDetectionTime=60000&lt;/code&gt; + up to &lt;code&gt;failureDetectionCount=5 × failureDetectionInterval=15000&lt;/code&gt;), &lt;strong&gt;reconnect ~5-15 seconds&lt;/strong&gt; (Aurora DNS propagation + fresh handshake). My app's p99 takes a visible bump during that window; business recovers within ~90 seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tuning the detection budget
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Aggressive&lt;/strong&gt; (~15-30 s to detect): &lt;code&gt;failureDetectionTime=15000&lt;/code&gt;, &lt;code&gt;failureDetectionInterval=5000&lt;/code&gt;, &lt;code&gt;failureDetectionCount=3&lt;/code&gt;. More probe traffic; more false positives on transient network blips.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default&lt;/strong&gt; (~75 s, what's in the YAML above): what I run by default. Good for most apps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lax&lt;/strong&gt; (~3+ min): raise &lt;code&gt;failureDetectionTime&lt;/code&gt; past 120000. Only use this if you have independent health-signal paths and don't want efm2 to chatter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One thing I stopped doing: &lt;strong&gt;don't set &lt;code&gt;socketTimeout&lt;/code&gt; small on the main connection&lt;/strong&gt; (&lt;code&gt;socketTimeout=5000&lt;/code&gt; and friends) hoping to catch failures faster. That fires on every slow query — including legitimate long-running reports — and turns every transient spike into connection churn. Let &lt;code&gt;efm2&lt;/code&gt; own liveness detection. Keep &lt;code&gt;socketTimeout=0&lt;/code&gt;. I learned this the hard way after a 12-minute query triggered a pool-wide connection churn event.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resilience patterns worth knowing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/FailoverConfigurationGuide.md" rel="noopener noreferrer"&gt;&lt;code&gt;failoverMode&lt;/code&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Controls fallback when no writer is reachable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;strict-writer&lt;/code&gt; — only reconnect to a writer. Default when connecting via the cluster writer endpoint. During a prolonged failover, connections stall until a new writer is up.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;reader-or-writer&lt;/code&gt; — fall back to a reader for reads if no writer is available. Default when connecting via the read-only cluster endpoint. Useful for read-heavy apps that can tolerate writes being rejected; writes still fail until the writer is back.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;strict-reader&lt;/code&gt; — never connect to the writer. Dedicated read-replica deployments only.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My default is &lt;code&gt;strict-writer&lt;/code&gt; (which matches the implicit default for cluster-writer-endpoint connections). I've only ever overridden it for a reporting workload where read availability mattered more than write availability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection churn during failover (don't panic)
&lt;/h3&gt;

&lt;p&gt;The immediate aftermath of a failover event looks rough on dashboards: &lt;code&gt;connections.creation&lt;/code&gt; spikes to seconds (new TLS handshakes), &lt;code&gt;connections.timeout&lt;/code&gt; briefly non-zero, p99 climbs. All expected. The key is the spike &lt;em&gt;ends&lt;/em&gt;, typically within ~30 seconds of the new writer being healthy. If you see a &lt;em&gt;sustained&lt;/em&gt; elevated &lt;code&gt;connections.creation&lt;/code&gt; after the event, check whether &lt;code&gt;exception-override-class-name&lt;/code&gt; is configured — without it, HikariCP keeps handing out invalidated connections and the churn doesn't stop on its own.&lt;/p&gt;

&lt;h3&gt;
  
  
  Read-only traffic during failover
&lt;/h3&gt;

&lt;p&gt;Readers are unaffected by writer failover. &lt;code&gt;readWriteSplitting&lt;/code&gt; + correctly-marked read-only transactions means read traffic keeps flowing while writes pause for ~30-60 seconds. For read-heavy apps, marking transactions &lt;code&gt;readOnly=true&lt;/code&gt; turns out to be both a performance win and an availability one. Do it for both reasons.&lt;/p&gt;

&lt;h3&gt;
  
  
  Blue/green deployments
&lt;/h3&gt;

&lt;p&gt;If you're doing Aurora blue/green (RDS Blue/Green), the switchover is a writer-failover-like event from the driver's perspective. The plugins cover it with no extra config, but the same detection-budget trade-offs apply: faster detection = faster cutover = more false-positive risk during normal ops.&lt;/p&gt;




&lt;h2&gt;
  
  
  RDS Proxy: when, and how it interacts with this driver
&lt;/h2&gt;

&lt;p&gt;If you've read this far, you're either using or considering RDS Proxy. The two layers — RDS Proxy in front of Aurora, the AWS JDBC driver inside your app — solve overlapping but not identical problems, and the AWS guidance you'd want to read together is scattered across the proxy planning page, the wrapper README, and a plugin doc most people miss.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchz2fqx56avg9rwr070g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchz2fqx56avg9rwr070g.png" alt="RDS Proxy architecture showing how the proxy sits between the application and Aurora cluster, with read/write and read-only endpoints" width="800" height="265"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  When AWS recommends RDS Proxy
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy-planning.html" rel="noopener noreferrer"&gt;planning page&lt;/a&gt; lists the canonical cases: "too many connections" pressure, T2/T3 instances where connection-setup CPU is significant, Lambda / serverless workloads, apps without a built-in pool, centralized IAM auth or Secrets Manager rotation, failover speedup (advertised at "up to 66%", typically &amp;lt;35 s for Multi-AZ Aurora), and Blue/Green deployments. For a long-lived Spring Boot pod with a well-tuned HikariCP, only the last three are particularly compelling — the multiplexing benefit is mostly theoretical when your external pool is sized correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  How RDS Proxy actually routes
&lt;/h3&gt;

&lt;p&gt;The thing that catches teams out is the assumption that the proxy "splits reads and writes intelligently." It doesn't. From the &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy-endpoints.html" rel="noopener noreferrer"&gt;endpoints docs&lt;/a&gt;, the proxy exposes two endpoints — a read/write endpoint that sends every request to the current writer, and a read-only endpoint that sends every request to &lt;em&gt;some&lt;/em&gt; reader (with proxy-level rebalancing if a reader fails). There's &lt;strong&gt;no SQL inspection&lt;/strong&gt;. The proxy routes where you point it, not what you send through it. SQL-aware splitting still requires application-side logic — either two &lt;code&gt;DataSource&lt;/code&gt; beans in your app or the &lt;code&gt;srw&lt;/code&gt; plugin described below.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plugin compatibility behind RDS Proxy
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper#rds-proxy" rel="noopener noreferrer"&gt;wrapper README's RDS Proxy section&lt;/a&gt; is unambiguous:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Functionality like Failover, Enhanced Host Monitoring, and Read/Write Splitting is not compatible since the driver relies on cluster topology and RDS Proxy handles this automatically. The driver remains useful with RDS Proxy for authentication workflows, such as IAM authentication and AWS Secrets Manager integration."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Translated:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plugin&lt;/th&gt;
&lt;th&gt;Behind RDS Proxy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;failover&lt;/code&gt;, &lt;code&gt;failover2&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Drop.&lt;/strong&gt; Proxy handles writer failover; topology lookups conflict with the hidden pool.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;efm2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Drop.&lt;/strong&gt; Per-connection probes don't see the underlying Aurora node.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;readWriteSplitting&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Drop.&lt;/strong&gt; Relies on topology that's invisible behind the proxy.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;iamAuth&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Keep&lt;/strong&gt; if you want JDBC-layer IAM (alternative to configuring it on the proxy).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;awsSecretsManager&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Optional — overlaps with proxy auth. Usually skip.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;srw&lt;/code&gt; (Simple R/W Splitting)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Keep&lt;/strong&gt; — purpose-built for this combination.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The &lt;code&gt;srw&lt;/code&gt; plugin — SQL-aware splitting through RDS Proxy
&lt;/h3&gt;

&lt;p&gt;Available since v3.0.0 and &lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheSimpleReadWriteSplittingPlugin.md" rel="noopener noreferrer"&gt;documented here&lt;/a&gt;. Unlike &lt;code&gt;readWriteSplitting&lt;/code&gt;, &lt;code&gt;srw&lt;/code&gt; doesn't query the cluster for topology. You give it two explicit endpoints — &lt;code&gt;srwWriteEndpoint&lt;/code&gt; (your read/write proxy endpoint) and &lt;code&gt;srwReadEndpoint&lt;/code&gt; (your read-only proxy endpoint) — and it switches between them on &lt;code&gt;Connection#setReadOnly(true/false)&lt;/code&gt;. With Spring's &lt;code&gt;@Transactional(readOnly=true)&lt;/code&gt;, you keep the same single-DataSource ergonomics you'd have with &lt;code&gt;readWriteSplitting&lt;/code&gt; against direct Aurora.&lt;/p&gt;

&lt;p&gt;Two gotchas. &lt;strong&gt;Role verification&lt;/strong&gt; (&lt;code&gt;verifyNewSrwConnections=true&lt;/code&gt; by default) runs &lt;code&gt;SELECT pg_catalog.pg_is_in_recovery()&lt;/code&gt; after switching, with up to a 60-second retry budget, to defend against DNS-cache staleness right after failover. Useful on paper; it conflicts with &lt;code&gt;autocommit=false&lt;/code&gt; because the verification query opens a transaction. Either set &lt;code&gt;setReadOnly&lt;/code&gt; &lt;em&gt;before&lt;/em&gt; disabling autocommit, or set &lt;code&gt;verifyNewSrwConnections=false&lt;/code&gt;. &lt;strong&gt;Mutual exclusion:&lt;/strong&gt; don't combine &lt;code&gt;srw&lt;/code&gt; with &lt;code&gt;readWriteSplitting&lt;/code&gt; or &lt;code&gt;gdbReadWriteSplitting&lt;/code&gt; on the same connection. They're alternatives, not layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision tree
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;Plugins&lt;/th&gt;
&lt;th&gt;Read/write split mechanism&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Direct to Aurora, no proxy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;readWriteSplitting&lt;/code&gt;, &lt;code&gt;auroraConnectionTracker&lt;/code&gt;, &lt;code&gt;failover&lt;/code&gt;, &lt;code&gt;efm2&lt;/code&gt; (+ &lt;code&gt;cp-*&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Wrapper plugin, one DataSource, &lt;code&gt;@Transactional(readOnly=true)&lt;/code&gt; routes via topology.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RDS Proxy + wrapper, SQL-aware split&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;srw&lt;/code&gt; (+ &lt;code&gt;iamAuth&lt;/code&gt; if needed)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;srw&lt;/code&gt; switches between two proxy endpoints on &lt;code&gt;setReadOnly&lt;/code&gt;. One DataSource.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RDS Proxy + plain &lt;code&gt;org.postgresql.Driver&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;Two DataSource beans (one per proxy endpoint). App routes manually.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lambda / serverless&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;RDS Proxy + plain driver. The wrapper's value is amortized warm-pool benefits — irrelevant for cold invocations.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Pinning — the multiplexing trap
&lt;/h3&gt;

&lt;p&gt;RDS Proxy multiplexes by handing one backend session to multiple client connections, but only when the session is &lt;em&gt;resettable&lt;/em&gt;. The &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy-pinning.html" rel="noopener noreferrer"&gt;pinning rules for Aurora PostgreSQL&lt;/a&gt; disable multiplexing on &lt;code&gt;SET&lt;/code&gt;, &lt;code&gt;PREPARE&lt;/code&gt;/&lt;code&gt;DEALLOCATE&lt;/code&gt;/&lt;code&gt;EXECUTE&lt;/code&gt;, temporary tables, declared cursors, &lt;code&gt;LISTEN&lt;/code&gt;, advisory locks, and any statement &amp;gt;16 KB. Hibernate with server-side prepared statements pins on every session. There are real teams (&lt;a href="https://zerolatency.medium.com/experience-with-aws-rds-proxy-in-production-and-why-we-had-to-revert-it-in-12-hours-392bc3372544" rel="noopener noreferrer"&gt;Aggarwal's 12-hour revert&lt;/a&gt; is the most-cited public postmortem) that hit ~100% pinning under load and pulled the proxy out the same day. The diagnostic is the &lt;code&gt;DatabaseConnections.PinnedConnections&lt;/code&gt; CloudWatch metric — if pinned connections approach total, you're paying for a proxy that isn't actually multiplexing.&lt;/p&gt;

&lt;h3&gt;
  
  
  My take
&lt;/h3&gt;

&lt;p&gt;RDS Proxy and the AWS JDBC driver aren't usually a "pick one" decision — they solve different concerns and can layer cleanly &lt;em&gt;if you pick the right plugins&lt;/em&gt;. Three rules I'd hold:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Failover ownership belongs to one layer.&lt;/strong&gt; Don't run &lt;code&gt;failover&lt;/code&gt; + &lt;code&gt;efm2&lt;/code&gt; behind a proxy. The proxy already does it; you're paying twice and risking conflicting reactions to transient errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read/write splitting needs an explicit choice.&lt;/strong&gt; Two DataSource beans, or &lt;code&gt;srw&lt;/code&gt;, or &lt;code&gt;readWriteSplitting&lt;/code&gt; (no proxy). Pick one — never two.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The wrapper still earns its keep behind a proxy &lt;em&gt;if&lt;/em&gt; you're using IAM auth or &lt;code&gt;srw&lt;/code&gt;.&lt;/strong&gt; Otherwise plain &lt;code&gt;org.postgresql.Driver&lt;/code&gt; is simpler and the wrapper's plugin chain is mostly cosmetic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your motivation for either layer is "make the app faster," neither is the answer — that's a query / index / cache problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  The checklist I run through before shipping
&lt;/h2&gt;

&lt;p&gt;Before I put the wrapper in front of production traffic, I go through this list. Nothing on it is optional.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Driver version ≥ 3.3.0.&lt;/strong&gt; &lt;code&gt;cp-*&lt;/code&gt; properties landed in v3.1.0 and &lt;code&gt;efm2&lt;/code&gt; has been available since v2.4.0, so this is not the floor for those features. The reason I draw the line at 3.3.0: it includes the readWriteSplitting + failover plugin-ordering fix and removes a 5-second sleep from the failover recovery path. If you're below 3.1.0, &lt;code&gt;cp-*&lt;/code&gt; won't work at all.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;F0 profile not in use unless version-aware&lt;/strong&gt; — on v2.x, F0 hardcodes &lt;code&gt;maxPoolSize=30&lt;/code&gt;. I've been burned.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;cp-MaximumPoolSize ≥ maximumPoolSize&lt;/code&gt;&lt;/strong&gt; on the external pool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;exception-override-class-name&lt;/code&gt;&lt;/strong&gt; set to &lt;code&gt;software.amazon.jdbc.util.HikariCPSQLException&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;socketTimeout=0&lt;/code&gt;&lt;/strong&gt; — liveness belongs to efm2.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read-only transactions annotated&lt;/strong&gt; — otherwise &lt;code&gt;readWriteSplitting&lt;/code&gt; is decorative.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aurora &lt;code&gt;max_connections&lt;/code&gt;&lt;/strong&gt; supports &lt;code&gt;pods × cp-max × (1 + readers)&lt;/code&gt; with 20% headroom.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Topology endpoint reachable&lt;/strong&gt; from every pod (cluster and per-instance DNS resolve via VPC DNS).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugin list ordered:&lt;/strong&gt; &lt;code&gt;readWriteSplitting,auroraConnectionTracker,failover,efm2&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability wired&lt;/strong&gt; — &lt;code&gt;hikaricp.connections.pending&lt;/code&gt; alert on non-zero steady state.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Where this leaves me
&lt;/h2&gt;

&lt;p&gt;The AWS JDBC Driver is one of those libraries where the defaults are opinionated but not obvious, the configuration surface is large, and the version-to-version behavior has shifted in ways that invalidate older docs you'll find on the internet. The cases where I've seen teams get into trouble all look the same: they adopted a profile without reading what was inside it, or they moved from v2.x to v3.x without re-checking whether the properties they'd set still did anything.&lt;/p&gt;

&lt;p&gt;If I could boil this post down to one practical habit: &lt;strong&gt;don't trust the external pool metrics alone.&lt;/strong&gt; The wrapper adds a whole second layer of pooling between your &lt;code&gt;hikaricp.connections&lt;/code&gt; count and the actual network. When the external pool metrics look fine but your requests are slow, look inside. And if you're still on v2.x with F0, upgrade — there is no property you can set to make it behave.&lt;/p&gt;

&lt;p&gt;I lost a few hours to this. You shouldn't have to lose any.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS Advanced JDBC Wrapper — driver docs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/UsingTheJdbcDriver.md" rel="noopener noreferrer"&gt;Using the JDBC Driver&lt;/a&gt; — full parameter reference including &lt;code&gt;wrapperPlugins&lt;/code&gt;, &lt;code&gt;wrapperProfileName&lt;/code&gt;, &lt;code&gt;wrapperDialect&lt;/code&gt;, and all connection properties&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/ConfigurationPresets.md" rel="noopener noreferrer"&gt;Configuration Presets&lt;/a&gt; — what F0, F1, SF0, etc. actually configure (plugins, pool settings, timeouts)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/HostSelectionStrategies.md" rel="noopener noreferrer"&gt;Host Selection Strategies&lt;/a&gt; — &lt;code&gt;roundRobin&lt;/code&gt;, &lt;code&gt;random&lt;/code&gt;, &lt;code&gt;leastConnections&lt;/code&gt;, &lt;code&gt;highestWeight&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/FailoverConfigurationGuide.md" rel="noopener noreferrer"&gt;Failover Configuration Guide&lt;/a&gt; — &lt;code&gt;failoverMode&lt;/code&gt;, detection tuning, transactional behavior during failover&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/Frameworks.md" rel="noopener noreferrer"&gt;Framework Integration&lt;/a&gt; — notes on Spring Boot, Hibernate, and other framework specifics&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/DataSource.md" rel="noopener noreferrer"&gt;DataSource Configuration&lt;/a&gt; — alternative to driver-mode configuration via &lt;code&gt;AwsWrapperDataSource&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/Compatibility.md" rel="noopener noreferrer"&gt;Compatibility&lt;/a&gt; — supported databases, JDBC versions, known limitations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AWS Advanced JDBC Wrapper — plugin docs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheReadWriteSplittingPlugin.md" rel="noopener noreferrer"&gt;&lt;code&gt;readWriteSplitting&lt;/code&gt;&lt;/a&gt; — reader routing, internal connection pooling with &lt;code&gt;connectionPoolType=hikari&lt;/code&gt;, &lt;code&gt;cp-*&lt;/code&gt; properties, &lt;code&gt;readerHostSelectorStrategy&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheFailoverPlugin.md" rel="noopener noreferrer"&gt;&lt;code&gt;failover&lt;/code&gt;&lt;/a&gt; — classic failover plugin; topology detection, connection invalidation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheFailover2Plugin.md" rel="noopener noreferrer"&gt;&lt;code&gt;failover2&lt;/code&gt;&lt;/a&gt; — newer failover implementation (v2); recommended for new deployments&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheHostMonitoringPlugin.md" rel="noopener noreferrer"&gt;&lt;code&gt;efm2&lt;/code&gt; (Host Monitoring)&lt;/a&gt; — &lt;code&gt;failureDetectionTime&lt;/code&gt;, &lt;code&gt;failureDetectionInterval&lt;/code&gt;, &lt;code&gt;failureDetectionCount&lt;/code&gt;, monitoring timeouts&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheAuroraConnectionTrackerPlugin.md" rel="noopener noreferrer"&gt;&lt;code&gt;auroraConnectionTracker&lt;/code&gt;&lt;/a&gt; — connection-to-instance mapping for failover invalidation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AWS Advanced JDBC Wrapper — examples and changelog
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/examples/SpringBootHikariExample/README.md" rel="noopener noreferrer"&gt;Spring Boot + HikariCP example&lt;/a&gt; — working YAML with &lt;code&gt;exception-override-class-name&lt;/code&gt; and HikariCP data-source properties&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/examples/SpringHibernateExample/README.md" rel="noopener noreferrer"&gt;Spring + Hibernate example&lt;/a&gt; — Hibernate-specific session factory integration&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/examples/SpringTxFailoverExample/README.md" rel="noopener noreferrer"&gt;Spring Transaction Failover example&lt;/a&gt; — handling transactional rollback during failover&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/pull/1658" rel="noopener noreferrer"&gt;PR #1658 — configurable internal pool&lt;/a&gt; — the change (v3.1.0) that made &lt;code&gt;cp-*&lt;/code&gt; properties work outside of profiles&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/CHANGELOG.md" rel="noopener noreferrer"&gt;Changelog&lt;/a&gt; — version-to-version migration notes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  RDS Proxy
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy-planning.html" rel="noopener noreferrer"&gt;Planning where to use Amazon RDS Proxy&lt;/a&gt; — the canonical use-case list (Lambda, T2/T3, IAM auth, Blue/Green, failover speedup)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy-endpoints.html" rel="noopener noreferrer"&gt;RDS Proxy endpoints&lt;/a&gt; — read/write vs read-only endpoints; "the proxy routes where you point it" semantics&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy-pinning.html" rel="noopener noreferrer"&gt;Avoiding pinning&lt;/a&gt; — full list of session-state operations that disable multiplexing per engine&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper#rds-proxy" rel="noopener noreferrer"&gt;Wrapper README — RDS Proxy section&lt;/a&gt; — the official statement that &lt;code&gt;failover&lt;/code&gt;, &lt;code&gt;efm2&lt;/code&gt;, and &lt;code&gt;readWriteSplitting&lt;/code&gt; are incompatible with RDS Proxy&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheSimpleReadWriteSplittingPlugin.md" rel="noopener noreferrer"&gt;Simple Read/Write Splitting Plugin (&lt;code&gt;srw&lt;/code&gt;)&lt;/a&gt; — the topology-agnostic plugin purpose-built for use behind RDS Proxy (since v3.0.0)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/rds/proxy/pricing/" rel="noopener noreferrer"&gt;RDS Proxy pricing&lt;/a&gt; — per vCPU-hour for provisioned, per ACU-hour for Serverless&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  External
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;HikariCP — &lt;a href="https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing" rel="noopener noreferrer"&gt;About Pool Sizing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Aurora PostgreSQL — &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Managing.html" rel="noopener noreferrer"&gt;Performance and scaling for Amazon Aurora PostgreSQL&lt;/a&gt; (source for the &lt;code&gt;max_connections&lt;/code&gt; default formula and the 5,000-connection cap)&lt;/li&gt;
&lt;li&gt;Aggarwal — &lt;a href="https://zerolatency.medium.com/experience-with-aws-rds-proxy-in-production-and-why-we-had-to-revert-it-in-12-hours-392bc3372544" rel="noopener noreferrer"&gt;"Experience with AWS RDS Proxy in production, and why we had to revert it in 12 hours"&lt;/a&gt; (cited in the pinning section)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>postgres</category>
      <category>rds</category>
      <category>jdbc</category>
    </item>
    <item>
      <title>Hosting a Static Website on Amazon S3</title>
      <dc:creator>Esther Ninyo</dc:creator>
      <pubDate>Wed, 29 Apr 2026 18:51:42 +0000</pubDate>
      <link>https://forem.com/aws-builders/hosting-a-static-website-on-amazon-s3-3fei</link>
      <guid>https://forem.com/aws-builders/hosting-a-static-website-on-amazon-s3-3fei</guid>
      <description>&lt;p&gt;Deploying a static website just got easier with S3 on AWS. You don't have to manage servers, and Amazon S3 is one of the easiest and most cost-effective ways to host a static website. &lt;/p&gt;

&lt;p&gt;In this project, I will walk you through how to deploy your project on S3 step-by-step. This is a beginner-friendly project.&lt;/p&gt;

&lt;p&gt;I will be making use of an HTML website I cloned in 2020 when I just started learning how to code. Feel free to use any HTML project of your choice. The steps are the same.&lt;br&gt;
If you don't have an HTML project, you can create a simple &lt;code&gt;index.html&lt;/code&gt; file and paste the code below into the file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;!DOCTYPE html&amp;gt;
&amp;lt;html&amp;gt;
&amp;lt;head&amp;gt;
    &amp;lt;title&amp;gt;My AWS Website&amp;lt;/title&amp;gt;
&amp;lt;/head&amp;gt;
&amp;lt;body&amp;gt;
    &amp;lt;h1&amp;gt;Hello from AWS S3!&amp;lt;/h1&amp;gt;
&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What we will do:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Create an S3 bucket&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Upload project files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enable static website hosting&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make the website publicly available&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Resources to be created:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS S3 bucket&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prerequisite&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;AWS account&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HTML project/Basic HTML Knowledge&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;STEP 1&lt;/strong&gt;: Create S3 bucket on AWS&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open the &lt;a href="https://aws.amazon.com/console/" rel="noopener noreferrer"&gt;AWS console&lt;/a&gt; and log in.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik4n8t7ln0ryafb18mh1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fik4n8t7ln0ryafb18mh1.png" alt="AWS console dashboard" width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search for S3 in the search bar and click on create bucket&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30apxlla1ibo6125bn0x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30apxlla1ibo6125bn0x.png" alt="Create bucket" width="800" height="187"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbnocun79yw9bz3tra66.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbnocun79yw9bz3tra66.png" alt="Create bucket options" width="559" height="836"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Enter a &lt;strong&gt;unique bucket name&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Disable &lt;strong&gt;block public access&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Leave everything else as it is and &lt;strong&gt;create bucket&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;STEP 2&lt;/strong&gt;: Upload your files&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqesyqg6s6cu18asltvfj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqesyqg6s6cu18asltvfj.png" alt="Upload files" width="800" height="167"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp47n7b1wwc2pih3d0ajx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp47n7b1wwc2pih3d0ajx.png" alt="upload files" width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After uploading the files&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhx5kqxnwuzf5ukcdmv7i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhx5kqxnwuzf5ukcdmv7i.png" alt="Files uploaded" width="800" height="168"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;STEP 3&lt;/strong&gt;: Enable static website hosting&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open bucket properties&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcdt6ddjk4enrfy561ezs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcdt6ddjk4enrfy561ezs.png" alt="bucket properties" width="800" height="103"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scroll all the way to the end and enable &lt;strong&gt;Static Website Hosting&lt;/strong&gt;. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Farjt4n44x9dl4hloqw2t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Farjt4n44x9dl4hloqw2t.png" alt="Edit static website hosting" width="800" height="123"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F04e296ylr21e4m5mjjw7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F04e296ylr21e4m5mjjw7.png" alt="enable static website hosting" width="800" height="505"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Save after all changes have been made.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;STEP 4&lt;/strong&gt;: Update Bucket policy&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to the permissions tab and update the bucket policy with the permissions below. Don't forget to save changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl3jkpvqv198ge76pwefb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl3jkpvqv198ge76pwefb.png" alt="Permission tab" width="800" height="102"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fevkqw2mtjoig4y2ktk7u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fevkqw2mtjoig4y2ktk7u.png" alt="permission" width="800" height="535"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PublicReadGetObject",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::YOUR-BUCKET-NAME/*"
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;STEP 5&lt;/strong&gt;: Access your website&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to the properties tab&lt;/li&gt;
&lt;li&gt;scroll all the way to the end&lt;/li&gt;
&lt;li&gt;Access your website using the &lt;strong&gt;Bucket Website Endpoint&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi3vz3xnut4yb0zh4hyfa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi3vz3xnut4yb0zh4hyfa.png" alt="Access website" width="800" height="197"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Closing Remark!&lt;/em&gt;&lt;br&gt;
You have successfully hosted your website on AWS using an S3 bucket. If you encounter any problems, kindly review what you've done and ensure you haven't missed any steps.&lt;/p&gt;

&lt;p&gt;Thank you for reading to the end. Kindly reach out to me in the comment section if you have any questions, or on &lt;a href="https://www.linkedin.com/in/esther-ninyo/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Till next time, cheers.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>beginners</category>
      <category>webdev</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Hardening Kubernetes: A Practical Guide to EKS Security with Terraform and Kyverno</title>
      <dc:creator>V-ris Jaijongrak</dc:creator>
      <pubDate>Wed, 29 Apr 2026 15:04:05 +0000</pubDate>
      <link>https://forem.com/aws-builders/hardening-kubernetes-a-practical-guide-to-eks-security-with-terraform-and-kyverno-2mj3</link>
      <guid>https://forem.com/aws-builders/hardening-kubernetes-a-practical-guide-to-eks-security-with-terraform-and-kyverno-2mj3</guid>
      <description>&lt;p&gt;In this post, we will explore how to secure an Amazon EKS cluster by applying infrastructure-as-code best practices and policy-driven guardrails. We will use Terraform to provision our infrastructure and Kyverno to enforce security policies at the cluster level.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Foundation: Infrastructure as Code
&lt;/h2&gt;

&lt;p&gt;To minimize our attack surface, we will deploy a private EKS cluster. The control plane will be inaccessible from the public internet, forcing all management traffic through a secure VPN tunnel.&lt;/p&gt;

&lt;p&gt;Our &lt;a href="https://github.com/guxkung/eks-terraform" rel="noopener noreferrer"&gt;Terraform&lt;/a&gt; setup includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;VPC Networking: A /16 VPC with three /24 private subnets and one public subnet for ingress.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bastion-OpenVPN: A Terraform module to provide a secure gateway into our private environment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;EKS NodeGroups: Managed worker nodes with defined instance types.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F089i3v5pmmqomza5rbm0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F089i3v5pmmqomza5rbm0.png" alt=" " width="782" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note: This setup is for demonstration. For production-grade architectures, always refer to &lt;a href="https://github.com/aws-ia" rel="noopener noreferrer"&gt;aws-ia&lt;/a&gt; to align with AWS best practices.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Establishing Secure Access
&lt;/h2&gt;

&lt;p&gt;Because the EKS API server resides in a private subnet, we cannot reach it directly from our local machine. We use the Bastion host as an intermediary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connecting via OpenVPN:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Generate Credentials:&lt;/strong&gt; Access your bastion host and run: &lt;code&gt;sudo /usr/local/bin/generate-client-cert.sh &amp;lt;client-name&amp;gt;&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve Config:&lt;/strong&gt; Pull the generated .ovpn file from S3: &lt;code&gt;aws s3 cp s3://&amp;lt;bucket-name&amp;gt;/clients/&amp;lt;client-name&amp;gt;.ovpn .&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configure Routing:&lt;/strong&gt; Update your .ovpn file to include the route to your VPC CIDR:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;route&lt;/span&gt; &amp;lt;&lt;span class="n"&gt;VPC&lt;/span&gt;-&lt;span class="n"&gt;CIDR&lt;/span&gt;&amp;gt; &amp;lt;&lt;span class="n"&gt;SUBNET&lt;/span&gt;-&lt;span class="n"&gt;MASK&lt;/span&gt;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;4. &lt;strong&gt;Connect:&lt;/strong&gt; Run &lt;code&gt;sudo openvpn --config &amp;lt;client-name&amp;gt;.ovpn&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Once the tunnel is active, you can interact with the cluster via &lt;code&gt;kubectl&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws eks update-kubeconfig &lt;span class="nt"&gt;--name&lt;/span&gt; &amp;lt;CLUSTER_NAME&amp;gt;
kubectl get nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result should look similar to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;NAME                                              STATUS   ROLES    AGE   VERSION
&lt;/span&gt;&lt;span class="gp"&gt;ip-172-xx-yy-zzz.aws-region.compute.internal      Ready    &amp;lt;none&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;21h   v1.34.4-eks-f69f56f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Policy-as-Code with Kyverno
&lt;/h2&gt;

&lt;p&gt;Infrastructure security is only half the battle. We also need guardrails for the workloads running &lt;em&gt;inside&lt;/em&gt; the cluster. &lt;a href="https://www.cncf.io/projects/kyverno/" rel="noopener noreferrer"&gt;Kyverno&lt;/a&gt; allows us to manage these policies as Kubernetes objects.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing the Policy Suite
&lt;/h3&gt;

&lt;p&gt;We will deploy &lt;code&gt;Kyverno&lt;/code&gt; and the &lt;code&gt;policy-reporter&lt;/code&gt; for a centralized security dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Kyverno&lt;/span&gt;
helm repo add kyverno https://kyverno.github.io/kyverno/
helm &lt;span class="nb"&gt;install &lt;/span&gt;kyverno &lt;span class="nt"&gt;--namespace&lt;/span&gt; kyverno &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; kyverno/kyverno

&lt;span class="c"&gt;# Install Policy Reporter&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;policy-reporter policy-reporter/policy-reporter &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="nt"&gt;--namespace&lt;/span&gt; policy-reporter &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; ui.enabled&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="nt"&gt;--set&lt;/span&gt; kyvernoPlugin.enabled&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Testing Guardrails
&lt;/h3&gt;

&lt;p&gt;Kyverno operates in two primary modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enforce:&lt;/strong&gt; Automatically modifies incoming requests (e.g., adding security contexts) to comply with security standards.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Audit:&lt;/strong&gt; Monitors and reports policy violations without necessarily blocking the workload.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example: Enforcing PSS (Pod Security Standards)
&lt;/h3&gt;

&lt;p&gt;If we apply a &lt;code&gt;mutate&lt;/code&gt; policy that enforces a "Restricted" security context, an Nginx pod might fail if it attempts to run as root.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mutation:&lt;/strong&gt; When we apply the PSS Restricted policy, our &lt;code&gt;Nginx&lt;/code&gt; pod may enter a &lt;code&gt;CrashLoopBackOff&lt;/code&gt; because it violates the enforced security constraints. A more compatible container, like &lt;code&gt;busybox&lt;/code&gt;, will run successfully.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Audit:&lt;/strong&gt; By using &lt;code&gt;validationFailureAction: Audit&lt;/code&gt;, we can track non-compliant pods without breaking existing applications. This is the recommended strategy when rolling out security policies to existing production clusters.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4.Next Steps: Observability
&lt;/h2&gt;

&lt;p&gt;Security is an ongoing process. To keep your cluster healthy and secure, implement observability using AWS-native tools like &lt;strong&gt;Amazon Managed Service for Prometheus (AMP)&lt;/strong&gt; and &lt;strong&gt;AWS Distro for OpenTelemetry (ADOT)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href="https://github.com/aws-observability/terraform-aws-observability-accelerator" rel="noopener noreferrer"&gt;terraform-aws-observability-accelerator&lt;/a&gt; to get started.&lt;/p&gt;

&lt;h4&gt;
  
  
  Final Reminder: You can find the full source code for this demonstration in my &lt;a href="https://github.com/guxkung/eks-terraform" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;. Don't forget to run terraform destroy when you are finished to avoid unnecessary AWS costs!
&lt;/h4&gt;




&lt;h2&gt;
  
  
  Appendix
&lt;/h2&gt;

&lt;h3&gt;
  
  
  To get the policy-report-ui dashboard
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;run &lt;code&gt;kubectl port-forward service/policy-reporter-ui 8082:8080 -n policy-reporter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;access from the browser via &lt;code&gt;http://localhost:8082&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Mutate policy example taken from Kyverno
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kyverno.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apply-pss-restricted-profile&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Apply PSS Restricted Profile&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Other, PSP Migration&lt;/span&gt;
    &lt;span class="na"&gt;kyverno.io/kyverno-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.6.2&lt;/span&gt;
    &lt;span class="na"&gt;kyverno.io/kubernetes-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.23"&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/subject&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
    &lt;span class="na"&gt;policies.kyverno.io/description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod Security Standards define the fields and their options which are allowable for Pods to achieve certain security best practices. While these are typically validation policies, workloads will either be accepted or rejected based upon what has already been defined. It is also possible to mutate incoming Pods to achieve the desired PSS level rather than reject. This policy sets all the fields necessary to pass the PSS Restricted profile. Note that it does not attempt to remove non-compliant volumes and volumeMounts. Additional policies may be employed for this purpose.&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;add-pss-fields&lt;/span&gt;
      &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;any&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;kinds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
      &lt;span class="na"&gt;mutate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;patchStrategicMerge&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;seccompProfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RuntimeDefault&lt;/span&gt;
              &lt;span class="na"&gt;runAsNonRoot&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
              &lt;span class="na"&gt;runAsUser&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1000&lt;/span&gt;
              &lt;span class="na"&gt;runAsGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3000&lt;/span&gt;
              &lt;span class="na"&gt;fsGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2000&lt;/span&gt;
            &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;(name)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;?*"&lt;/span&gt;
                &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;privileged&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
                  &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                    &lt;span class="na"&gt;drop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ALL&lt;/span&gt;
                  &lt;span class="na"&gt;allowPrivilegeEscalation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  nginx pod yaml
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;creationTimestamp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
    &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
  &lt;span class="na"&gt;dnsPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterFirst&lt;/span&gt;
  &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Always&lt;/span&gt;
&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tabayz0zc9x3szfakn7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tabayz0zc9x3szfakn7.png" alt=" " width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  busybox pod yaml
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;creationTimestamp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;busybox-0&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;busybox-0&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sleep&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3600"&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;busybox&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;busybox-0&lt;/span&gt;
    &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
  &lt;span class="na"&gt;dnsPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterFirst&lt;/span&gt;
  &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Always&lt;/span&gt;
&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikm3c4x76kl9xqjam9oq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikm3c4x76kl9xqjam9oq.png" alt=" " width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  validate policy example taken from Kyverno
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kyverno.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pss-audit&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;validationFailureAction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Audit&lt;/span&gt;
  &lt;span class="na"&gt;background&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;check-run-as-non-root&lt;/span&gt;
      &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;kinds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
      &lt;span class="na"&gt;validate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Running&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;as&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;root&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;not&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;allowed"&lt;/span&gt;
        &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;securityContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;runAsNonRoot&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  busybox pod complying with validate policy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;creationTimestamp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;busybox-1&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;busybox-1&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sleep&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3600"&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;busybox&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;busybox-1&lt;/span&gt;
    &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
  &lt;span class="na"&gt;dnsPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterFirst&lt;/span&gt;
  &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Always&lt;/span&gt;
&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2dr5kvdwwda3jo4frj6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2dr5kvdwwda3jo4frj6.png" alt=" " width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>eks</category>
      <category>security</category>
    </item>
    <item>
      <title>AWS Amplify Cache Is Useless — And Here Is the Data to Prove It</title>
      <dc:creator>Tanseer</dc:creator>
      <pubDate>Wed, 29 Apr 2026 06:47:04 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-amplify-cache-is-useless-and-here-is-the-data-to-prove-it-2jn3</link>
      <guid>https://forem.com/aws-builders/aws-amplify-cache-is-useless-and-here-is-the-data-to-prove-it-2jn3</guid>
      <description>&lt;h2&gt;
  
  
  Who This Is For
&lt;/h2&gt;

&lt;p&gt;If you are deploying a frontend or full-stack app on AWS Amplify and your builds feel slower than they should be, this blog is worth reading. We are going to talk about Amplify's caching system — what it is supposed to do, what it actually does, and why in my experience it makes things worse, not better.&lt;/p&gt;

&lt;p&gt;No deep AWS knowledge is required. If you know what a build pipeline is and have used Amplify at least once, you will follow this completely.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem I Ran Into
&lt;/h2&gt;

&lt;p&gt;I was deploying an app on AWS Amplify. The build had two phases: install packages and build the app. Pretty standard setup.&lt;/p&gt;

&lt;p&gt;The total build time was sitting at around 9 minutes. That felt too long. So I opened the build logs and started looking at where the time was actually going.&lt;/p&gt;

&lt;p&gt;Here is what I found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 minutes to restore cache (fetch previously stored files)&lt;/li&gt;
&lt;li&gt;3 minutes to install packages and build the app&lt;/li&gt;
&lt;li&gt;3 minutes to save cache (store files for the next build)
So out of 9 minutes, the actual work — installing and building — was only 3 minutes. The other 6 minutes were spent entirely on cache operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That immediately felt wrong. Cache is supposed to speed things up. If it is consuming twice the time of the actual build, something is broken.&lt;/p&gt;




&lt;h2&gt;
  
  
  The First Experiment: Disable Cache Entirely
&lt;/h2&gt;

&lt;p&gt;My first instinct was simple. What if I just removed the cache configuration completely and let Amplify install everything fresh every time?&lt;/p&gt;

&lt;p&gt;I removed the cache settings from my &lt;code&gt;amplify.yml&lt;/code&gt; build config and triggered a new build.&lt;/p&gt;

&lt;p&gt;The result: 3 minutes and 30 seconds.&lt;/p&gt;

&lt;p&gt;The build went from 9 minutes to 3 minutes 30 seconds just by removing cache. Yes, it took an extra 30 seconds to download packages compared to the ideal cached scenario. But it saved 6 full minutes of cache overhead.&lt;/p&gt;

&lt;p&gt;This alone should raise a flag. The cache was not saving time. It was adding time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Amplify Cache, Exactly?
&lt;/h2&gt;

&lt;p&gt;Before going further, let me explain how Amplify's caching works, because understanding the mechanism is key to understanding why it fails.&lt;/p&gt;

&lt;p&gt;When Amplify runs a build, it can be configured to save certain folders — most commonly &lt;code&gt;node_modules&lt;/code&gt; — by zipping them up and storing them in S3 (AWS's file storage service). On the next build, it fetches that zip, unzips it into the build environment, and in theory your packages are already there so the install step is faster.&lt;/p&gt;

&lt;p&gt;The key operation here is: zip and upload after a build, download and unzip before the next build.&lt;/p&gt;

&lt;p&gt;This is how Amplify's cache model works. It is essentially just copying folders in and out of storage between builds.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Second Experiment: Maybe It Is My Project
&lt;/h2&gt;

&lt;p&gt;After the first result, I thought maybe the problem was specific to my project. I had a reasonably large dependency tree. Maybe the &lt;code&gt;node_modules&lt;/code&gt; folder was so big that zipping and unzipping it was always going to take longer than just reinstalling.&lt;/p&gt;

&lt;p&gt;So I created a minimal test project — a simple website with almost no packages. Just enough to have a &lt;code&gt;package.json&lt;/code&gt; and a basic build step. The kind of project where &lt;code&gt;node_modules&lt;/code&gt; is tiny and cache should be trivially fast.&lt;/p&gt;

&lt;p&gt;I deployed it on Amplify with cache enabled.&lt;/p&gt;

&lt;p&gt;Same result. Amplify spent time fetching the cache, and then installed all dependencies from scratch anyway. The cache folder it had stored from the previous build was essentially ignored from a practical standpoint.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Root Cause: Cache and npm Are Fundamentally Incompatible
&lt;/h2&gt;

&lt;p&gt;After these experiments, I did some digging and found the real reason this does not work. It comes down to how npm (the package manager) behaves versus how Amplify's cache model works.&lt;/p&gt;

&lt;p&gt;Amplify caches folders. That is it. It saves a folder, restores a folder.&lt;/p&gt;

&lt;p&gt;But here is the problem:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you use &lt;code&gt;npm ci&lt;/code&gt;&lt;/strong&gt; (which is the recommended command for CI/CD pipelines because it gives you clean, reproducible installs), it deletes &lt;code&gt;node_modules&lt;/code&gt; entirely before installing. Every single time. It does not matter that Amplify just spent 3 minutes restoring that folder. &lt;code&gt;npm ci&lt;/code&gt; will delete it and start over.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you use &lt;code&gt;npm install&lt;/code&gt;&lt;/strong&gt; (the more common development command), it does not always delete &lt;code&gt;node_modules&lt;/code&gt;, but it re-evaluates the dependency tree and may reinstall or update packages depending on what it finds. So even here, the cache is not reliably used.&lt;/p&gt;

&lt;p&gt;In both cases, the cached &lt;code&gt;node_modules&lt;/code&gt; folder is either deleted outright or partially ignored.&lt;/p&gt;

&lt;p&gt;Amplify's own documentation recommends using &lt;code&gt;npm ci&lt;/code&gt; for builds. But &lt;code&gt;npm ci&lt;/code&gt; by design destroys exactly what Amplify's cache tries to preserve. These two things directly contradict each other.&lt;/p&gt;

&lt;p&gt;The cache model and the install command are working against each other.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Simple Way to Think About It
&lt;/h2&gt;

&lt;p&gt;Imagine you spend 10 minutes carefully organizing your desk every night before bed so it is ready for tomorrow. But every morning, the first thing you do is clear everything off the desk and start fresh. The organizing you did the night before is completely wasted.&lt;/p&gt;

&lt;p&gt;That is exactly what is happening here. Amplify organizes the &lt;code&gt;node_modules&lt;/code&gt; folder into cache. npm wipes the desk clean every build.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Numbers Look Like Side by Side
&lt;/h2&gt;

&lt;p&gt;To make this concrete, here is a comparison of what I observed:&lt;/p&gt;

&lt;p&gt;With cache enabled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Restore cache: ~3 minutes&lt;/li&gt;
&lt;li&gt;Install and build: ~3 minutes&lt;/li&gt;
&lt;li&gt;Save cache: ~3 minutes&lt;/li&gt;
&lt;li&gt;Total: ~9 minutes
With cache disabled:&lt;/li&gt;
&lt;li&gt;Install and build: ~3 minutes 30 seconds&lt;/li&gt;
&lt;li&gt;Total: ~3 minutes 30 seconds
The "optimized" build with cache took more than twice as long as the build with no cache at all.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What You Should Do Instead
&lt;/h2&gt;

&lt;p&gt;Based on everything above, my recommendation is straightforward: disable Amplify cache unless you have a very specific reason to use it and have verified it is actually helping.&lt;/p&gt;

&lt;p&gt;To disable it, remove or empty the &lt;code&gt;cache&lt;/code&gt; section from your &lt;code&gt;amplify.yml&lt;/code&gt;. Here is what a build config without cache looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;frontend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;phases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;preBuild&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;npm run build&lt;/span&gt;
  &lt;span class="na"&gt;artifacts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;baseDirectory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build&lt;/span&gt;
    &lt;span class="na"&gt;files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;**/*'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No cache block. Clean and simple.&lt;/p&gt;

&lt;p&gt;If your builds are still slow after removing cache, the bottleneck is likely somewhere else — large dependencies, slow build tools, or the build machine itself. Those are worth investigating separately, but at least you will not be wasting time on a cache that is not working.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Conclusion
&lt;/h2&gt;

&lt;p&gt;AWS Amplify's cache feature is built on a model that zips and unzips folders between builds. That model does not account for how npm actually works. &lt;code&gt;npm ci&lt;/code&gt; deletes &lt;code&gt;node_modules&lt;/code&gt; before every install. &lt;code&gt;npm install&lt;/code&gt; may partially reinstall anyway. The result is that the cache restore step costs real time — in my case, 3 minutes per build — and delivers no actual benefit.&lt;/p&gt;

&lt;p&gt;I tested this on a large app and a minimal app. I tried &lt;code&gt;npm ci&lt;/code&gt; and &lt;code&gt;npm install&lt;/code&gt;. I made sure cache folders were correctly configured and permissions were in place. In every scenario, disabling cache made builds faster.&lt;/p&gt;

&lt;p&gt;This feels like a fundamental design mismatch between Amplify's caching mechanism and how modern package managers work.&lt;/p&gt;




&lt;h2&gt;
  
  
  Has This Happened to You?
&lt;/h2&gt;

&lt;p&gt;I am genuinely curious whether other developers have experienced this. Have you found a way to make Amplify cache actually work? Did you measure a real improvement? Or did you hit the same wall?&lt;/p&gt;

&lt;p&gt;Drop a comment or reach out — I would love to hear if someone has cracked this or if this is a widely shared frustration in the community.&lt;/p&gt;




&lt;h2&gt;
  
  
  Need Help With Your Amplify Setup?
&lt;/h2&gt;

&lt;p&gt;If you are running into build time issues or anything else with your Amplify deployment, feel free to reach out. Happy to help.&lt;/p&gt;

&lt;p&gt;Email me at &lt;strong&gt;&lt;a href="mailto:khantanseer43@gmail.com"&gt;khantanseer43@gmail.com&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




</description>
      <category>aws</category>
      <category>amplify</category>
      <category>serverless</category>
      <category>cicd</category>
    </item>
  </channel>
</rss>
