<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sebastian</title>
    <description>The latest articles on Forem by Sebastian (@boringcontributor).</description>
    <link>https://forem.com/boringcontributor</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1258367%2F432433ba-54ba-4411-8ed5-788fe5bb1533.jpeg</url>
      <title>Forem: Sebastian</title>
      <link>https://forem.com/boringcontributor</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/boringcontributor"/>
    <language>en</language>
    <item>
      <title>Open Source Serverless Product Analytics on AWS</title>
      <dc:creator>Sebastian</dc:creator>
      <pubDate>Fri, 02 Jan 2026 10:40:00 +0000</pubDate>
      <link>https://forem.com/boringcontributor/open-source-serverless-product-analytics-on-aws-3pg2</link>
      <guid>https://forem.com/boringcontributor/open-source-serverless-product-analytics-on-aws-3pg2</guid>
      <description>&lt;p&gt;Product analytics shouldn't require managing servers, containers, or complex infrastructure. Yet most self-hosted alternatives to tools like Plausible or Umami assume you'll spin up Docker containers, manage databases, and deal with scaling headaches.&lt;/p&gt;

&lt;p&gt;I built this open source solution to change that. It's a &lt;strong&gt;fully serverless, self-hostable analytics platform&lt;/strong&gt; that deploys into &lt;strong&gt;your own AWS account&lt;/strong&gt; with a single CDK command.&lt;/p&gt;

&lt;p&gt;No servers. No Docker builds. &lt;strong&gt;Minimal, predictable baseline cost.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You get privacy-focused analytics infrastructure that scales from zero to millions of events without operational overhead.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important note&lt;/strong&gt;&lt;br&gt;
This repository provides a &lt;strong&gt;production-grade analytics ingestion pipeline&lt;/strong&gt;, not a polished analytics SaaS.&lt;/p&gt;

&lt;p&gt;Event collection, buffering, replay, and storage are solid and designed for real workloads.&lt;br&gt;
What is still evolving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;authorization and multi-tenant access control&lt;/li&gt;
&lt;li&gt;the query / insights layer (dashboards, funnels, cohorts)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re comfortable building on top of a strong foundation—or want to contribute—this project is for you.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this post, I’ll walk through how analytics platforms work under the hood, explore two serverless architectures on AWS, and explain the trade-offs behind the approach I chose.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Analytics Platforms Work
&lt;/h2&gt;

&lt;p&gt;Every analytics system—whether it's Google Analytics, Plausible, or a custom solution—follows the same fundamental pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser → Ingestion API → Buffer → Processor → Storage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This separation of concerns is what allows analytics systems to scale reliably without impacting application performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Collection (Browser)
&lt;/h3&gt;

&lt;p&gt;A lightweight JavaScript snippet runs on your site and captures events: page views, clicks, web vitals. It batches these events and sends them to your backend using &lt;code&gt;sendBeacon&lt;/code&gt; for reliability or &lt;code&gt;fetch&lt;/code&gt; for flexibility.&lt;/p&gt;

&lt;p&gt;The script should be &lt;strong&gt;tiny (ideally under ~1KB gzipped)&lt;/strong&gt; so it doesn’t affect page performance or Core Web Vitals.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ingestion API
&lt;/h3&gt;

&lt;p&gt;An HTTP endpoint receives events from the browser. Its responsibilities should be minimal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;validate the payload&lt;/li&gt;
&lt;li&gt;enrich it with metadata (e.g. geolocation from request headers)&lt;/li&gt;
&lt;li&gt;push the event downstream&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The API should &lt;strong&gt;return immediately&lt;/strong&gt; and never block on heavy processing or database writes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Buffer
&lt;/h3&gt;

&lt;p&gt;The buffer decouples ingestion from processing.&lt;/p&gt;

&lt;p&gt;Events are written to a queue or stream so your ingestion API remains fast even during traffic spikes. This layer absorbs bursts, smooths load, and allows downstream consumers to process events at their own pace.&lt;/p&gt;

&lt;h3&gt;
  
  
  Processor
&lt;/h3&gt;

&lt;p&gt;A worker reads events from the buffer, transforms them into the shape your storage expects, and writes them out.&lt;/p&gt;

&lt;p&gt;This is also where batching happens to reduce write amplification and keep storage costs under control.&lt;/p&gt;

&lt;h3&gt;
  
  
  Storage
&lt;/h3&gt;

&lt;p&gt;This is the query layer. It must handle analytical workloads efficiently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;aggregations over time ranges&lt;/li&gt;
&lt;li&gt;grouping by dimensions (referrer, country, device, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Row-based databases work at small scale, but &lt;strong&gt;columnar stores like ClickHouse&lt;/strong&gt; are dramatically more efficient as data volume grows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Serverless Approaches on AWS
&lt;/h2&gt;

&lt;p&gt;When designing this for AWS, I evaluated two architectures. Both are fully serverless, but they differ in cost characteristics, operational complexity, and replay capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 1: EventBridge + SQS (Near-Zero Cost at Rest)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser → Lambda Function URL → EventBridge → SQS → Processor Lambda → Storage
                                           ↘ S3 (raw archive)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the purest pay-per-request model.&lt;/p&gt;

&lt;p&gt;EventBridge acts as the routing layer: one rule forwards events to SQS for processing, while another rule triggers a Lambda that archives raw events to S3.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Near-zero cost when there’s no traffic&lt;/li&gt;
&lt;li&gt;Simple mental model with declarative routing rules&lt;/li&gt;
&lt;li&gt;Easy extensibility—new consumers are just new EventBridge rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trade-offs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No built-in replay mechanism&lt;/li&gt;
&lt;li&gt;Reprocessing requires manual replay from S3&lt;/li&gt;
&lt;li&gt;Limited control over batching semantics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach works well for side projects, low-traffic sites, or scenarios where minimizing idle cost is the top priority.&lt;/p&gt;




&lt;h3&gt;
  
  
  Approach 2: Kinesis Data Streams + Firehose (What I Built)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser → Lambda Function URL → Kinesis Data Stream → Firehose → S3
                                                    ↘ Lambda → Storage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the architecture I chose.&lt;/p&gt;

&lt;p&gt;Kinesis Data Streams acts as the central event log. Firehose handles archival to S3 automatically, while a Lambda consumer processes events and writes them to the analytics database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built-in replay via configurable retention (7 days by default, up to 365)&lt;/li&gt;
&lt;li&gt;Strict ordering guarantees within partitions (important for session reconstruction)&lt;/li&gt;
&lt;li&gt;Seamless Firehose integration for batching, compression, and delivery to S3&lt;/li&gt;
&lt;li&gt;Predictable throughput and backpressure via shards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trade-offs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not zero-cost at rest (one shard is roughly ~$11/month)&lt;/li&gt;
&lt;li&gt;Requires basic capacity planning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I chose this approach because &lt;strong&gt;replayability and operational simplicity matter more than absolute zero idle cost&lt;/strong&gt; for a production analytics system. The baseline cost is predictable, and the architecture scales cleanly as traffic grows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Storage Layer
&lt;/h2&gt;

&lt;p&gt;By default, the project uses &lt;strong&gt;AWS Aurora DSQL&lt;/strong&gt;. I chose it to experiment with a fully serverless SQL database.&lt;/p&gt;

&lt;p&gt;It works—but for analytical workloads, &lt;strong&gt;ClickHouse is the better choice&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Columnar storage, compression, and built-in aggregation functions make a significant difference for time-series analytics.&lt;/p&gt;

&lt;p&gt;The storage layer is abstracted behind an interface, so swapping backends is a configuration change. For real-world traffic, I recommend pointing the system at ClickHouse (ClickHouse Cloud or self-hosted) instead of DSQL.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;The entire stack deploys with a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;make deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This provisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the ingestion API (built in Rust)&lt;/li&gt;
&lt;li&gt;buffering infrastructure (Kinesis + Firehose)&lt;/li&gt;
&lt;li&gt;raw event archival to S3&lt;/li&gt;
&lt;li&gt;a query API built with OpenAPI backend with sane defaults&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything is defined in CDK and can be customized via configuration without touching the core architecture.&lt;/p&gt;

&lt;p&gt;The repository is available on GitHub:&lt;br&gt;
👉 &lt;a href="https://github.com/boringContributor/aws-serverless-product-analytics" rel="noopener noreferrer"&gt;https://github.com/boringContributor/aws-serverless-product-analytics&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Contributing &amp;amp; Roadmap
&lt;/h2&gt;

&lt;p&gt;This project is intentionally modular and open for contributions.&lt;/p&gt;

&lt;p&gt;Areas where help is especially valuable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;authorization &amp;amp; multi-tenant access control&lt;/li&gt;
&lt;li&gt;query API design (funnels, breakdowns, cohorts)&lt;/li&gt;
&lt;li&gt;ClickHouse schemas and query optimizations&lt;/li&gt;
&lt;li&gt;dashboard and visualization experiments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re interested, issues are labeled and the architecture is documented to make onboarding easier.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Serverless analytics on AWS is not only possible—it’s practical.&lt;/p&gt;

&lt;p&gt;You get the privacy and control benefits of self-hosting &lt;strong&gt;without&lt;/strong&gt; managing servers, containers, or always-on infrastructure. Whether you choose a near-zero-cost EventBridge pipeline or a replay-friendly Kinesis-based architecture depends on your traffic patterns and tolerance for baseline cost.&lt;/p&gt;

&lt;p&gt;The code is open source. Deploy it, fork it, or use it as a reference for building your own event ingestion pipelines.&lt;/p&gt;

&lt;p&gt;If you have questions or want to adapt this setup to your needs, feel free to set up a quick call for a one-time collaboration:&lt;br&gt;
👉 &lt;a href="https://cal.com/someone" rel="noopener noreferrer"&gt;https://cal.com/someone&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>analytics</category>
      <category>cdk</category>
    </item>
    <item>
      <title>Storing Sensitive Information in DynamoDB with KMS</title>
      <dc:creator>Sebastian</dc:creator>
      <pubDate>Wed, 24 Dec 2025 13:10:10 +0000</pubDate>
      <link>https://forem.com/boringcontributor/storing-sensitive-information-in-dynamodb-with-kms-510k</link>
      <guid>https://forem.com/boringcontributor/storing-sensitive-information-in-dynamodb-with-kms-510k</guid>
      <description>&lt;p&gt;Recently I faced an issue with AWS EventBridge Connections. It's a managed AWS service that handles secrets for you—you configure authentication for an API (either for yourself or your customers, like webhooks), and EventBridge Connections handles the rest when attached to EventBridge API Destination or Step Functions HTTP invoke tasks.&lt;/p&gt;

&lt;p&gt;Both services seem great at first glance, but reveal limitations once you move beyond simple use cases. In my case, the lack of customization and control became a blocker. This led me to research alternatives: Where can I store customer-provided secrets or sensitive data securely?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Obvious Choice: AWS Secrets Manager
&lt;/h2&gt;

&lt;p&gt;For most people, the first solution that comes to mind is AWS Secrets Manager. Secrets Manager is a fully managed service designed specifically for storing and rotating secrets like database credentials, API keys, and OAuth tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is Secrets Manager?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AWS Secrets Manager helps you protect access to your applications, services, and IT resources without upfront investment and ongoing maintenance costs. It enables you to rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle.&lt;/p&gt;

&lt;p&gt;Key features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatic secret rotation&lt;/li&gt;
&lt;li&gt;Fine-grained access control via IAM&lt;/li&gt;
&lt;li&gt;Audit and compliance through CloudTrail logging&lt;/li&gt;
&lt;li&gt;Integration with RDS, DocumentDB, and other AWS services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Downsides&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While Secrets Manager is powerful, it's not always necessary:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: Secrets Manager charges $0.40 per secret per month, plus $0.05 per 10,000 API calls. For applications managing many customer secrets, this adds up quickly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overkill for simple use cases&lt;/strong&gt;: If you don't need automatic rotation or the advanced features, you're paying for functionality you won't use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complexity&lt;/strong&gt;: For straightforward encryption needs, the service adds unnecessary overhead.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is where AWS Key Management Service (KMS) becomes an attractive alternative.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Better Fit: AWS KMS
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What is KMS?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AWS Key Management Service (KMS) is a managed service that makes it easy to create and control cryptographic keys used to encrypt your data. Unlike Secrets Manager, KMS doesn't store your secrets—it stores encryption keys that you use to encrypt and decrypt data yourself.&lt;/p&gt;

&lt;p&gt;Think of it this way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Secrets Manager&lt;/strong&gt;: A secure vault that stores your secrets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KMS&lt;/strong&gt;: A key custodian that holds the keys you use to lock/unlock your own vault&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why KMS for DynamoDB?&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  DynamoDB Encryption vs Application-Level Encryption
&lt;/h3&gt;

&lt;p&gt;It's important to clarify that DynamoDB already encrypts all data at rest by default using AWS-managed KMS keys. This protects your data against physical disk access and AWS infrastructure-level threats.&lt;/p&gt;

&lt;p&gt;However, server-side encryption (SSE) alone is often not sufficient when dealing with customer-provided secrets.&lt;/p&gt;

&lt;p&gt;Application-level encryption (encrypting data before storing it in DynamoDB) provides additional guarantees:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Protects against overly permissive IAM policies&lt;/li&gt;
&lt;li&gt;Limits exposure in case of accidental data access&lt;/li&gt;
&lt;li&gt;Keeps data encrypted in exports, backups, and logs&lt;/li&gt;
&lt;li&gt;Enables fine-grained access control at the application boundary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this guide, we’re focusing on application-level encryption, where sensitive values are encrypted using KMS before being written to DynamoDB, rather than relying solely on DynamoDB’s built-in encryption at rest.&lt;/p&gt;

&lt;p&gt;When storing sensitive data in DynamoDB, you have two main approaches:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Store references in DynamoDB&lt;/strong&gt;: Encrypt secrets, store them in AWS Secret Manager (SSM), then store the reference in DynamoDB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Store encrypted data directly in DynamoDB&lt;/strong&gt;: Encrypt the sensitive data with KMS and store the encrypted value directly in your DynamoDB table&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The second approach is simpler and more cost-effective for many use cases. Let's explore how to implement it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up KMS with AWS CDK
&lt;/h2&gt;

&lt;p&gt;Here's how to create a KMS key and configure it for use with DynamoDB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;kms&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-kms&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;dynamodb&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-dynamodb&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-iam&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;StackProps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;RemovalPolicy&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;constructs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SecureStorageStack&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Stack&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;StackProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Create a KMS key for encrypting sensitive data&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;encryptionKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;kms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SensitiveDataKey&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Key for encrypting customer secrets in DynamoDB&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;enableKeyRotation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Automatically rotate key every year&lt;/span&gt;
      &lt;span class="na"&gt;removalPolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;RemovalPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RETAIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Keep key even if stack is deleted&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Create DynamoDB table&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;secretsTable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SecretsTable&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;partitionKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;customerId&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AttributeType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;STRING&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;sortKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;secretId&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AttributeType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;STRING&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;billingMode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BillingMode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PAY_PER_REQUEST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Grant your Lambda function access to the key&lt;/span&gt;
    &lt;span class="c1"&gt;// (Assuming you have a Lambda function defined)&lt;/span&gt;
    &lt;span class="nx"&gt;encryptionKey&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grantEncryptDecrypt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;yourLambdaFunction&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;secretsTable&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grantReadWriteData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;yourLambdaFunction&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Add key ARN to Lambda environment variables&lt;/span&gt;
    &lt;span class="nx"&gt;yourLambdaFunction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;KMS_KEY_ID&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;encryptionKey&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;keyId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;yourLambdaFunction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SECRETS_TABLE_NAME&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;secretsTable&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tableName&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Encrypting and Decrypting in TypeScript
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Important Limitation: KMS Encrypt Size Limit
&lt;/h3&gt;

&lt;p&gt;AWS KMS Encrypt has a maximum plaintext size of 4 KB. This works well for small secrets such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API keys&lt;/li&gt;
&lt;li&gt;Webhook secrets&lt;/li&gt;
&lt;li&gt;Short OAuth tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, it will &lt;strong&gt;not work&lt;/strong&gt; for larger payloads like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PEM certificates&lt;/li&gt;
&lt;li&gt;Large JSON credentials&lt;/li&gt;
&lt;li&gt;Multi-field configuration blobs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Envelope Encryption for Larger Secrets
&lt;/h3&gt;

&lt;p&gt;For secrets larger than 4 KB, you should use envelope encryption:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use KMS to generate a data encryption key (DEK)&lt;/li&gt;
&lt;li&gt;Encrypt the secret locally using a symmetric algorithm (e.g. AES-256-GCM)&lt;/li&gt;
&lt;li&gt;Store the encrypted secret and the encrypted data key together in DynamoDB&lt;/li&gt;
&lt;li&gt;Decrypt the data key with KMS only when needed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scales to arbitrarily large secrets&lt;/li&gt;
&lt;li&gt;Minimizes KMS API calls&lt;/li&gt;
&lt;li&gt;Is the recommended best practice by AWS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, we focus on the direct Encrypt / Decrypt approach for simplicity and small secrets. For production systems handling larger payloads, envelope encryption should be used instead.&lt;/p&gt;

&lt;p&gt;Here's how to encrypt and decrypt sensitive data using the AWS SDK for JavaScript v3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;KMSClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;EncryptCommand&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;DecryptCommand&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-sdk/client-kms&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;DynamoDBClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-sdk/client-dynamodb&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;DynamoDBDocumentClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;PutCommand&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;GetCommand&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-sdk/lib-dynamodb&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;kmsClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;KMSClient&lt;/span&gt;&lt;span class="p"&gt;({});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dynamoClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;DynamoDBDocumentClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DynamoDBClient&lt;/span&gt;&lt;span class="p"&gt;({}));&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;KMS_KEY_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;KMS_KEY_ID&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;TABLE_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SECRETS_TABLE_NAME&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;CustomerSecret&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;secretId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;encryptedValue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Encrypt a sensitive value using KMS
 */&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;encryptSecret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;plaintext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;EncryptCommand&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;KeyId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;KMS_KEY_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Plaintext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;plaintext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;utf-8&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;kmsClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;command&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CiphertextBlob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Encryption failed: no ciphertext returned&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Convert to base64 for storage&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CiphertextBlob&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Decrypt a KMS-encrypted value
 */&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;decryptSecret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;encryptedValue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DecryptCommand&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;CiphertextBlob&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;encryptedValue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="c1"&gt;// Note: KeyId is optional for decrypt - KMS knows which key was used&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;kmsClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;command&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Plaintext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Decryption failed: no plaintext returned&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Plaintext&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;utf-8&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Store an encrypted secret in DynamoDB
 */&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;storeSecret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;secretId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;plainSecret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;encryptedValue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;encryptSecret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;plainSecret&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CustomerSecret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;secretId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;encryptedValue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;dynamoClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PutCommand&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;TableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TABLE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Retrieve and decrypt a secret from DynamoDB
 */&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getSecret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;secretId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;dynamoClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;GetCommand&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;TableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TABLE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;secretId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;secret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Item&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;CustomerSecret&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;decryptSecret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;encryptedValue&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Example usage&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;example&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Store a customer's API key&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;storeSecret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;customer-123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;api-key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;super-secret-api-key-xyz&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Retrieve and decrypt it later&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getSecret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;customer-123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;api-key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Decrypted API key:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  SSM Parameter Store vs KMS: The Trade-offs
&lt;/h2&gt;

&lt;p&gt;You might wonder: should I use SSM Parameter Store with KMS encryption, or encrypt data directly with KMS and store it in DynamoDB?&lt;/p&gt;

&lt;h3&gt;
  
  
  SSM Parameter Store Approach
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Centralized secret management&lt;/li&gt;
&lt;li&gt;Built-in versioning&lt;/li&gt;
&lt;li&gt;Free tier: up to 10,000 parameters&lt;/li&gt;
&lt;li&gt;Parameter Store integrates with many AWS services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extra API calls (SSM + DynamoDB)&lt;/li&gt;
&lt;li&gt;Additional latency&lt;/li&gt;
&lt;li&gt;Two services to manage&lt;/li&gt;
&lt;li&gt;10,000 parameter limit may be restrictive for large-scale applications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Store in SSM, reference in DynamoDB&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;paramName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`/customers/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/secrets/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;secretId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ssm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putParameter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;paramName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;plainSecret&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SecureString&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Uses KMS encryption&lt;/span&gt;
  &lt;span class="na"&gt;KeyId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;KMS_KEY_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Store reference in DynamoDB&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putItem&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;TableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Customers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;S&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;customerId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;secretRef&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;S&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;paramName&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;// Just the reference&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Direct KMS Encryption in DynamoDB
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single service (DynamoDB)&lt;/li&gt;
&lt;li&gt;Lower latency (one API call instead of two)&lt;/li&gt;
&lt;li&gt;No parameter count limits&lt;/li&gt;
&lt;li&gt;Simpler architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No built-in versioning (you'd implement it yourself)&lt;/li&gt;
&lt;li&gt;Less visibility in AWS Console&lt;/li&gt;
&lt;li&gt;Manual rotation handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to use which:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use SSM Parameter Store&lt;/strong&gt; if you need versioning, have &amp;lt; 10,000 secrets, or want integration with other AWS services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use direct KMS encryption&lt;/strong&gt; for high-scale applications, lower latency requirements, or when secrets are tightly coupled with your DynamoDB data model&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pricing Comparison
&lt;/h2&gt;

&lt;p&gt;Let's compare costs for storing 50,000 customer secrets:&lt;/p&gt;

&lt;h3&gt;
  
  
  Secrets Manager
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Storage: 50,000 secrets × $0.40/month = &lt;strong&gt;$20,000/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;API calls: Assuming 1M calls/month = 100 × $0.05 = &lt;strong&gt;$5/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~$20,005/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  KMS + DynamoDB
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;KMS key: 1 key × $1/month = &lt;strong&gt;$1/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;KMS API calls: Encrypt + Decrypt requests are billed separately at $0.03 per 10,000 requests (pricing varies slightly by region). Assuming ~1M total requests per month: ~$3/month (Note: KMS also includes a small free tier for requests)&lt;/li&gt;
&lt;li&gt;DynamoDB storage: 50,000 items × ~1KB = ~50MB × $0.25/GB = &lt;strong&gt;$0.01/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;DynamoDB reads/writes: Varies by usage, let's say &lt;strong&gt;$10/month&lt;/strong&gt; for moderate traffic&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~$14/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cost difference is dramatic: &lt;strong&gt;$20,005 vs $14 per month&lt;/strong&gt; for the same number of secrets.&lt;/p&gt;

&lt;h3&gt;
  
  
  SSM Parameter Store + DynamoDB
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;SSM: First 10,000 free, then $0.05 per parameter/month&lt;/li&gt;
&lt;li&gt;For 50,000 params: 40,000 × $0.05 = &lt;strong&gt;$2,000/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Plus KMS and DynamoDB costs: &lt;strong&gt;~$14/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~$2,014/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;While AWS Secrets Manager is excellent for managing application secrets with automatic rotation, it's often overkill—and expensive—for storing customer-provided secrets or sensitive data at scale.&lt;/p&gt;

&lt;p&gt;For most use cases involving customer secrets in DynamoDB, encrypting data directly with KMS offers the best balance of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Strong encryption with managed keys&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Single API call to retrieve data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt;: Dramatically lower than Secrets Manager&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplicity&lt;/strong&gt;: Fewer moving parts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trade-off is that you’ll need to implement your own versioning logic if required. In practice, most customer-provided secrets cannot and should not be rotated automatically by your system. API keys, webhook secrets, and OAuth credentials are typically owned and rotated by the customer or an external provider, making Secrets Manager’s rotation features largely irrelevant for these use cases. But for many applications, this is a worthwhile exchange for the cost savings and performance benefits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key takeaway&lt;/strong&gt;: Choose the right tool for the job. Secrets Manager shines for application secrets with rotation needs. KMS excels for high-volume, customer-specific data encryption.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you dealt with similar challenges managing secrets at scale? I'd love to hear about your approach in the comments below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>kms</category>
      <category>dynamodb</category>
      <category>ssm</category>
    </item>
    <item>
      <title>Practical usage of asynchronous context tracking in NodeJS and AWS Lambda</title>
      <dc:creator>Sebastian</dc:creator>
      <pubDate>Fri, 07 Nov 2025 09:55:20 +0000</pubDate>
      <link>https://forem.com/boringcontributor/practical-usage-of-asynchronous-context-tracking-in-nodejs-and-aws-lambda-1gff</link>
      <guid>https://forem.com/boringcontributor/practical-usage-of-asynchronous-context-tracking-in-nodejs-and-aws-lambda-1gff</guid>
      <description>&lt;p&gt;Asynchronous context tracking in NodeJS, introduced with version 16, addresses a common challenge in node applications: maintaining context across asynchronous operations. In such environment, where non-blocking I/O operations are the norm, it can be difficult to preserve a "context" or "state" across callbacks, promises, or async/await operations. This is crucial for tasks like tracking user sessions, handling transactions, or implementing logging that depends on knowing the sequence of operations that led to a particular state.&lt;/p&gt;

&lt;p&gt;It provides a way for developers to preserve this context without resorting to complex workarounds. It uses the &lt;strong&gt;AsyncLocalStorage API&lt;/strong&gt;, part of the &lt;a href="https://nodejs.org/docs/latest-v18.x/api/async_hooks.html#class-asynclocalstorage" rel="noopener noreferrer"&gt;async_hooks module&lt;/a&gt;, allowing developers to store and access data that is specific to a particular sequence of asynchronous operations. This makes it possible to easily pass along context through the many layers of asynchronous calls, improving the ability to monitor, debug, and write more maintainable applications. Essentially, it gives developers the power to keep track of the execution flow, even in the inherently asynchronous environment of NodeJS, making it easier to manage application state across asynchronous boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  AsyncLocalStorage within AWS Lambda Environments
&lt;/h2&gt;

&lt;p&gt;In the dynamic world of AWS Lambda, where functions respond to events in isolated invocations, managing context across asynchronous operations can be a bit of a juggling act. &lt;strong&gt;AsyncLocalStorage&lt;/strong&gt; is designed to elegantly handle this exact scenario. It offers a seamless way to maintain context without the cumbersome need to pass state around through function parameters. Each Lambda invocation is treated as a standalone execution, ensuring that context does not leak between invocations, even in warm containers that are reused for efficiency. This setup guarantees that every function run has its clean slate regarding asynchronous context, making code more readable, maintainable, and significantly reducing the likelihood of bugs related to improper context management.&lt;/p&gt;

&lt;p&gt;Let's consider the following example, which tracks the user claims by introducing a middy middleware&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;AsyncLocalStorage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;node:async_hooks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@middy/core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;zod&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Claims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;zod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Claims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;zod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;infer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;Claims&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;claimsStorage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;AsyncLocalStorage&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Claims&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;useClaims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;claimsStorage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getStore&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;invalid claims&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;withUserStoredInContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MiddlewareObj&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;APIGatewayProxyEventV2WithJWTAuthorizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyStructuredResultV2&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;before&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;APIGatewayProxyEventV2WithJWTAuthorizer&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;claims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Claims&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requestContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;authorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;jwt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;claims&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nx"&gt;claimsStorage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enterWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;claims&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we implement a function &lt;strong&gt;useClaims&lt;/strong&gt;, and can access claims throughout the whole request lifecycle now.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;useClaims&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;DB&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./db&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;isEmpty&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;remeda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;sub&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useClaims&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;allPromotions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listBySub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;allPromotions&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nf"&gt;isEmpty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;allPromotions&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;transformPromotions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;allPromotions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Pitfalls
&lt;/h2&gt;

&lt;p&gt;Using &lt;strong&gt;AsyncLocalStorage&lt;/strong&gt; with AWS Lambda offers many benefits for context management across asynchronous operations. However, there are some considerations and potential pitfalls to be aware of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;enterWith&lt;/strong&gt; is still experimental. That only stable function to populate the store is to use the run method. See this stackoverflow discussion.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Understanding Context Propagation:&lt;/strong&gt; Developers must have a clear understanding of how context is propagated across asynchronous calls to effectively use AsyncLocalStorage. Misunderstandings can lead to context loss or incorrect assumptions about the availability of context, resulting in bugs that are difficult to diagnose.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡&lt;br&gt;
I already faced this quite often e.g. calling the getStore and not knowing why it returns undefined even though I thought I populated the store&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Potential for Context Leaks&lt;/strong&gt;: Incorrect usage of AsyncLocalStorage, especially not properly entering and exiting the context, can lead to context information leaking across Lambda invocations in warm containers. Although AWS Lambda provides isolation between invocations, improper management of context can introduce subtle bugs related to context contamination.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cold Start &amp;amp; Memory Consumption:&lt;/strong&gt; I did not verify this yet, but there are quite some old discussions about performance degradations when using these hooks. I can imagine the use of async hooks do have some impact on the performance of an application, be it the time to start a container or the memory consumption.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How do you use the AsyncLocalStorage? Let's discuss!&lt;/p&gt;

&lt;p&gt;Else, give it a try, cheers!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>lambda</category>
      <category>node</category>
    </item>
    <item>
      <title>Scaling Notification Systems: How a Single Timestamp Improved Our DynamoDB Performance</title>
      <dc:creator>Sebastian</dc:creator>
      <pubDate>Wed, 14 May 2025 08:02:51 +0000</pubDate>
      <link>https://forem.com/epilot/scaling-notification-systems-how-a-single-timestamp-improved-our-dynamodb-performance-5c84</link>
      <guid>https://forem.com/epilot/scaling-notification-systems-how-a-single-timestamp-improved-our-dynamodb-performance-5c84</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The epilot platform contains a comprehensive notification system. Users receive notifications about ongoing tasks, such as new assignments or overdue tasks. They can also get notified about incoming emails or when someone mentions them in notes, and the list goes on. Users can choose to receive these notifications via email or as in-app notifications. This article focuses on the latter.&lt;br&gt;
Initially, in-app notifications were stored in Aurora (AWS's solution for SQL-based databases). This setup soon became a major pain point, prompting us to migrate to DynamoDB. The simplicity of the notification data structure and the amount of read and write operations we expected made DynamoDB the perfect choice to scale.&lt;br&gt;
However, if you don't think carefully about how you design access patterns in DynamoDB, more problems arise than you'd expect. &lt;br&gt;
Let's dive into why a bad implementation of a &lt;strong&gt;markAllAsRead&lt;/strong&gt; feature us some headache and how we reduced the complexity from &lt;strong&gt;O(n)&lt;/strong&gt; to &lt;strong&gt;O(1)&lt;/strong&gt; by using a timestamp-based approach for unread notifications.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;The initial design was straightforward. Every user gets a new notification item in the DynamoDB table. The partition key (pk) was a combination of user_id and organization_id, while the sort key (sk) contains the notification_id. The access patterns were quite straightforward: &lt;strong&gt;fetch all notifications for a given user&lt;/strong&gt;, &lt;strong&gt;mark a notification as read&lt;/strong&gt;, and &lt;strong&gt;mark all notifications as read&lt;/strong&gt; for the lazy ones. The latter is the origin of this article.&lt;br&gt;
An attribute &lt;strong&gt;read_state&lt;/strong&gt; indicates if a notification was already read by a user. Marking a single notification was quite straightforward. It was as simple as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;markAsRead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ddb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;TableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NOTIFICATIONS_TABLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;toUserNotificationSK&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;UpdateExpression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SET read_state = :read_state&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;ExpressionAttributeValues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;:read_state&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// binary 1 is true&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once a notification is read, the item is updated and &lt;strong&gt;read_state&lt;/strong&gt; is set to &lt;strong&gt;1&lt;/strong&gt;. A Global Secondary Index (GSI) called &lt;strong&gt;byReadState&lt;/strong&gt; then allows us to read all unread notifications for a given tenant (org + user). This created two operations that performed poorly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;A bad implementation of the &lt;strong&gt;markAllAsRead&lt;/strong&gt; feature. It first queried all unread notifications and then performed a batch operation to update all notifications to read. As shown in the graph below, DynamoDB began to throttle under load, when people with lots of unread notifications used the marked all as read feature. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To indicate that a user has unread messages, a &lt;strong&gt;getTotalUnreadCount&lt;/strong&gt; endpoint is exposed. This allows us to render a notification bell in the UI to show the unread count. &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3taktv37h743x7thcfu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz3taktv37h743x7thcfu.png" alt="dynamodb throttles" width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The naive implementation to batch update all unread notifications worked surprisingly well in the beginning. However, as the volume of notifications increased, we started experiencing more and more throttling events in DynamoDB. What started as occasional hiccups became a serious bottleneck in our notification service's performance.&lt;/p&gt;

&lt;p&gt;The issue was multi-faceted. &lt;strong&gt;First&lt;/strong&gt;, DynamoDB has limits on batch operations, requiring us to split large batches into multiple smaller operations. This not only added complexity to our code but also increased the probability of partial failures. &lt;br&gt;
&lt;strong&gt;Second&lt;/strong&gt;, each notification update consumed Write Capacity Units (WCUs) from our table's provisioned capacity. For users with hundreds or thousands of unread notifications, a single "Mark All as Read" action would consume a significant portion of our available WCUs, causing other notification operations to be throttled.&lt;br&gt;
&lt;strong&gt;Importantly&lt;/strong&gt;, these issues didn't affect the entire epilot platform, but were isolated to the notification service itself. Users would see timeouts or delayed responses specifically when interacting with notifications, while the rest of the platform continued to function normally. &lt;br&gt;
However, this created a frustrating user experience, especially for power users who relied heavily on notifications to manage their workflows.&lt;br&gt;
The problem was particularly severe for organizations with large teams, where notification counts could grow rapidly, and the "Mark All as Read" feature was used frequently to manage notification overload. &lt;/p&gt;
&lt;h2&gt;
  
  
  The Solution: Last Read Timestamp
&lt;/h2&gt;

&lt;p&gt;After evaluating several options, we settled on a timestamp-based approach that would fundamentally change how we track read states while maintaining backward compatibility with our existing system.&lt;br&gt;
Instead of updating each notification individually when a user clicks "Mark All as Read," we simply record the timestamp of when this action occurred. Any notification created before this timestamp is considered "read," while notifications arriving after it are "unread." This solution transforms what was an O(n) operation into an O(1) operation, regardless of how many notifications a user has.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The New Table Structure&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We created a new DynamoDB table called notifications-read-state with the following structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;pk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`ORG#&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;orgId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;#USER#&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Partition key&lt;/span&gt;
  &lt;span class="nx"&gt;sk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`READMARK#&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;// Sort key&lt;/span&gt;
  &lt;span class="nx"&gt;read_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ISO8601Timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;// When the user marked all as read&lt;/span&gt;
  &lt;span class="nx"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ISO8601Timestamp&lt;/span&gt;         &lt;span class="c1"&gt;// When this record was created&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The primary key design allows us to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Efficiently lookup the most recent "mark all as read" timestamp for any user&lt;/li&gt;
&lt;li&gt;Support multiple organizations per user&lt;/li&gt;
&lt;li&gt;Maintain a history of read events if needed for analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;strong&gt;read_at&lt;/strong&gt; attribute stores an ISO-formatted timestamp that serves as our "high water mark" for read notifications. This single attribute is the cornerstone of our solution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;markAllAsRead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;orgId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Step 1: Query for all unread notifications&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;unreadNotifications&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getUnreadNotifications&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;orgId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 2: Prepare batch updates (25 items per batch due to DynamoDB limits)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;batches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createBatches&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;unreadNotifications&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 3: Execute all batch updates&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;batch&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;batches&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ddb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;batchWrite&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;RequestItems&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;NOTIFICATIONS_TABLE&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="nx"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
          &lt;span class="na"&gt;PutRequest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;read_state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Complexity: O(n) - As the number of notifications increases, both processing time and database load increase linearly.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;markAllAsRead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;orgId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Single write operation&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ddb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;TableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NOTIFICATIONS_READ_STATE_TABLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;pk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`ORG#&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;orgId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;#USER#&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;sk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`READMARK#&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;now&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;read_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Complexity: O(1) - Constant time operation regardless of notification count.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This simple change drastically improved our system's performance. The "Mark All as Read" operation now completes in milliseconds instead of potentially seconds, uses a predictable amount of database capacity, and never times out, even for users with thousands of unread notifications.&lt;br&gt;
What makes this approach particularly powerful is that we don't need to modify any existing notifications. Instead, we're recording a state transition that implicitly affects all notifications for a user at once.&lt;/p&gt;

&lt;p&gt;Given the following pseudo-code, the &lt;strong&gt;byReadState&lt;/strong&gt; index can be removed completely. All you need is to fetch the latest &lt;strong&gt;getLastReadTimestamp&lt;/strong&gt; for a given user and calculate whether the notification was already seen or not.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lastReadAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getLastReadTimestamp&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;orgId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orgId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;LastEvaluatedKey&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ddb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;
  &lt;span class="na"&gt;TableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NOTIFICATIONS_TABLE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;IndexName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;byTimestamp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ExclusiveStartKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cursor&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nf"&gt;decodeLastEvaluatedKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;notificationsWithReadStatus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// A notification is considered read if:&lt;/span&gt;
  &lt;span class="c1"&gt;// - It was created before the last "mark all as read" time OR&lt;/span&gt;
  &lt;span class="c1"&gt;// - It has been individually marked as read (read_state = 1)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isRead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nx"&gt;lastReadAt&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;read_state&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;read_state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;isRead&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
 &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;Our journey from the first version to the optimized solution taught us a lot about designing for scale—especially with DynamoDB.&lt;/p&gt;

&lt;p&gt;At first, updating each notification one by one seemed fine. It was simple, worked great in dev, and handled early traffic just fine. But as usage grew, that approach quickly hit its limits. It was a good reminder: what works now might not work when your data grows 10x.&lt;/p&gt;

&lt;p&gt;The breakthrough came when we stopped trying to optimize the old way and instead rethought the problem. Rather than updating every record, we started recording state changes with timestamps. That shift made things both simpler and faster—and it's a pattern that applies well beyond notifications.&lt;/p&gt;

&lt;p&gt;Most importantly, we learned to play to DynamoDB’s strengths: fast, predictable access with simple operations. Once we aligned our design with that, everything clicked.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;It’s easy to overthink scalability from the start, but the truth is: you won’t know your real problems until users are actually using the system. Our experience reminded us that it's totally fine to start simple and ship. You learn way more from real-world usage than from guessing at edge cases.&lt;/p&gt;

&lt;p&gt;Scalability issues aren’t failures—they’re signs of growth. When we hit our limits, it forced us to rethink things. And funny enough, the fix—a single timestamp—ended up being both simple and powerful. It made the system faster, more reliable, and easier to reason about.&lt;/p&gt;

&lt;p&gt;So if you’re torn between shipping something basic now or building for every possible future, go with the simple version. Ship it, learn from it, and improve as you go.&lt;/p&gt;

&lt;p&gt;Do you want to work on features like this? Check out our &lt;a href="https://www.epilot.cloud/en/company/careers#Offene-Stellenangebote" rel="noopener noreferrer"&gt;career page&lt;/a&gt; or reach out to at &lt;a href="https://x.com/boingCntributor" rel="noopener noreferrer"&gt;X&lt;/a&gt; or &lt;a href="https://www.linkedin.com/in/sauerer/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>dynamodb</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Building a Scalable Audit Log System with AWS and ClickHouse</title>
      <dc:creator>Sebastian</dc:creator>
      <pubDate>Tue, 26 Nov 2024 09:00:05 +0000</pubDate>
      <link>https://forem.com/epilot/building-a-scalable-audit-log-system-with-aws-and-clickhouse-jn5</link>
      <guid>https://forem.com/epilot/building-a-scalable-audit-log-system-with-aws-and-clickhouse-jn5</guid>
      <description>&lt;p&gt;Audit logs might seem like a backend feature that only a few people care about, but they play a crucial role in keeping things running smoothly and securely in any SaaS or tech company. Let me take you through our journey of building a robust and scalable audit log system. Along the way, I’ll share why we needed it, what exactly audit logs are, and how we combined tools like AWS, ClickHouse, and OpenAPI to craft a solution that works like a charm.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Case of the Disappearing Configuration
&lt;/h2&gt;

&lt;p&gt;At epilot, we’ve encountered a frustratingly familiar scenario. A customer reaches out, upset that one of their workflow configurations has mysteriously vanished. Their immediate question? “Who deleted it?”—and the assumption is that someone on our team is responsible.&lt;/p&gt;

&lt;p&gt;Now here’s the tricky part: how do we, as engineers, figure out who did what and when?&lt;/p&gt;

&lt;p&gt;One obvious approach is to dive into the application logs. But here’s the catch: most of the production logs aren’t enabled by default. Even when they are, they’re often sampled, capturing only about 10% of the actual traffic. Additionally, those logs often seem to lack the required information. This means we’re left piecing together incomplete data, like trying to solve a puzzle with half the pieces missing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Are Audit Logs Anyway?
&lt;/h2&gt;

&lt;p&gt;Audit logs provide clear visibility into system changes, aiding teams in investigations, diagnosing incidents, and tracing unauthorized actions. They empower admins by reducing support reliance and ensuring clarity on actions like role or workflow updates. For enterprise customers, audit logs are a critical, expected feature that supports compliance with standards like ISO 27001. Additionally, they lay the groundwork for enhanced threat detection capabilities in the future. In simple terms they try to help to answer the following questions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WHO&lt;/strong&gt; is doing something. Typically a user or a system (api call)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WHAT&lt;/strong&gt; is that user/system doing?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WHERE&lt;/strong&gt; is that occurring from? (e.g. an IP address)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WHEN&lt;/strong&gt; did it occur?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WHY?&lt;/strong&gt; (optional) Why did the user log in? → we don’t know, Why is its IP blocked? → User logged in 5 times with the wrong password&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Considerations for a Successful Audit Log System
&lt;/h2&gt;

&lt;p&gt;Before diving into the technical details, it’s crucial to define what makes an audit log system effective. While the exact requirements depend on your company’s domain, there are some universal points worth considering:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compliance&lt;/strong&gt;: Ensure the system adheres to regulations like GDPR. For example, customers may request the deletion of personal data, so you’ll need a straightforward way to erase all logs tied to a specific customer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sustainability&lt;/strong&gt;: Audit logs grow rapidly, especially in high-traffic systems. Storing them indefinitely may not be feasible. Decide on strategies for archiving or purging logs over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Permissions&lt;/strong&gt;: Define who is allowed to access audit logs to maintain security and privacy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Format&lt;/strong&gt;: Standardize the structure of your logs to ensure they’re easy to interpret and query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Selection&lt;/strong&gt;: Carefully determine what actions and events are worth logging to answer critical questions effectively, without unnecessary noise.&lt;/p&gt;




&lt;h2&gt;
  
  
  Making It Happen: How We Built Our Audit Logs
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsp6ixm91x3oeb3ywahek.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsp6ixm91x3oeb3ywahek.png" alt=" " width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At epilot, our APIs are built around serverless components provided by AWS. From the outset, we recognized that AWS API Gateway events provided a rich source of information for building audit logs. These events capture critical details such as user identities, actions performed (through the request payload), IP addresses, headers, and more.&lt;/p&gt;

&lt;p&gt;Given our microservices architecture, where services are organized by domain and accessed through an API Gateway (&lt;a href="https://docs.epilot.io/docs/architecture/overview/#system-architecture-diagram" rel="noopener noreferrer"&gt;see our system architecture&lt;/a&gt;), we needed a solution that seamlessly integrated with this structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High-Level Overview&lt;/strong&gt;&lt;br&gt;
Our approach to audit logging can be summarized as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Capturing events asynchronously.&lt;/li&gt;
&lt;li&gt;Validating and transforming raw events into a standard format.&lt;/li&gt;
&lt;li&gt;Persisting the data in a read-only, scalable, and query-friendly storage system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This design adheres to several key technical principles:&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Asynchronous Event Capture&lt;/u&gt;&lt;br&gt;
We use Amazon SQS to decouple event capture from the main HTTP request flow. For example, when a user creates a new workflow configuration, the relevant API Gateway event is pushed to an SQS queue by middleware wrapping the API. This ensures that audit logging does not introduce latency or affect the performance of the core application logic.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;From Raw to Standardized Events&lt;/u&gt;&lt;br&gt;
Our focus is on capturing system modifications, specifically HTTP methods like POST, PUT, PATCH, and DELETE. These provide meaningful insights into changes occurring within the system. GET requests, on the other hand, generate excessive noise and are generally excluded—though we offer an opt-in mechanism for services where logging GET requests adds value.&lt;/p&gt;

&lt;p&gt;A Lambda function processes raw API Gateway events from the SQS queue, transforming them into a structured and validated format. This includes filtering relevant data, enhancing it using metadata like OpenAPI specifications, and ensuring consistency across all logged events.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Data Persistence&lt;/u&gt;&lt;br&gt;
For storing audit logs, we chose &lt;a href="https://clickhouse.com/" rel="noopener noreferrer"&gt;ClickHouse&lt;/a&gt;, a highly scalable, SQL-based database that aligns with our requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read-only access: Supports immutability to preserve data integrity.
Scalability: Proven in our data lake setup to handle large volumes of data efficiently.&lt;/li&gt;
&lt;li&gt;Querying: SQL capabilities allow for precise filtering and analysis, which is more complex with alternatives like DynamoDB.
By leveraging ClickHouse, we ensure a robust and scalable foundation for our audit logs, simplifying future integrations and analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;u&gt;Integration&lt;/u&gt;&lt;br&gt;
To make audit logging effortless for our microservices, we focused on seamless integration. At epilot, we rely heavily on &lt;a href="https://middy.js.org/" rel="noopener noreferrer"&gt;middy&lt;/a&gt;, a middleware engine used across all our services. Building on this, we introduced a new middleware: &lt;strong&gt;withAuditLog&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { withAuditLog } from '@epilot/audit-log'
import middy from '@middy/core'
import type { Handler } from 'aws-lambda'


export const withMiddlewares = (handler: lambda.Handler) =&amp;gt; {
  return middy(handler)
    .use(enableCorrelationIds())
    .use(...)
    .use(
      withAuditLog({
        ignorePaths: ['/v1/webhooks/configs/{configId}/trigger']
      })
    )
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This middleware integrates directly into existing services and simplifies the audit logging process by:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capturing API Gateway Events&lt;/strong&gt;: It hooks into the request lifecycle to extract the API Gateway event details.&lt;br&gt;
Omitting GET Requests by Default: To reduce noise, it filters out GET requests, with an option to opt them in for specific services where needed.&lt;br&gt;
&lt;strong&gt;Forwarding to SQS&lt;/strong&gt;: Its primary role is to forward the event to an SQS queue for asynchronous processing.&lt;/p&gt;

&lt;p&gt;With this middleware, adding audit logging to any microservice is as simple as including withAuditLogs in the service's middleware stack and giving the SQS:SendMessage permission. This ensures consistency, reduces implementation effort, and keeps the integration process dead simple.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Technical Considerations&lt;/strong&gt;&lt;br&gt;
This article focuses on our high-level approach to building audit logs, as there are numerous ways to tackle the problem, each with its trade-offs. During our research, we explored alternatives like EventBridge for emitting events at the end of each request or Kinesis for streaming data. Ultimately, we chose a solution that met our key requirements: decoupling log emission from the main flow while offering flexibility in managing throughput and batching.&lt;/p&gt;

&lt;p&gt;Here’s why we chose SQS:&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Decoupling from the Main Flow&lt;/u&gt;&lt;br&gt;
SQS allows us to process audit logs asynchronously, ensuring that the main HTTP request flow remains unaffected. This means audit log processing won’t slow down user-facing operations.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Flexibility with Throughput and Batching&lt;/u&gt;&lt;br&gt;
With SQS, we can fine-tune parameters like long-polling and batch windows to optimize throughput without compromising efficiency. This ensures scalable and reliable processing regardless of traffic spikes.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Scalability for POST/PUT/PATCH/DELETE Events&lt;/u&gt;&lt;br&gt;
Since we exclude GET requests by default, the system can handle fewer, more meaningful events. Capturing GET requests would require supporting a higher volume of events, potentially leading to Lambda concurrency issues, as multiple Lambda environments subscribing to the same queue could interfere with other services also using Lambda.&lt;/p&gt;




&lt;h2&gt;
  
  
  Exposing Audit Logs to Users
&lt;/h2&gt;

&lt;p&gt;To make audit logs accessible and actionable, we introduced a new &lt;a href="https://sst.dev/" rel="noopener noreferrer"&gt;SST&lt;/a&gt;-based microservice that acts as a bridge to query data from ClickHouse. This microservice provides a simple and intuitive interface for users to explore their audit logs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Search and Filtering: A user-friendly search bar allows users to combine filters effortlessly, enabling them to pinpoint specific events or patterns within the logs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Activity Messages: Each audit log entry includes an activity message, a concise summary of what occurred. This message is dynamically constructed on the API side, tailored to the specific service name, making it customizable and relevant.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By customizing the activity messages for each service, users can quickly understand what happened in their systems without wading through raw data. This tailored approach ensures that the audit logs deliver immediate value and clarity to the end users.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgat6dm8vy7qv7tt4sr1c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgat6dm8vy7qv7tt4sr1c.png" alt=" " width="800" height="355"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;In this article, we detailed the design and implementation of our audit log system at epilot, highlighting the key decisions and considerations that shaped its architecture. Our approach leverages AWS serverless components to seamlessly integrate audit logging into our microservices, ensuring scalability, efficiency, and ease of use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capturing Events:&lt;/strong&gt; Using a custom middleware, withAuditLogs, we extract API Gateway events asynchronously and forward them to an SQS queue, ensuring the logging process does not block the main application flow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Processing and Storing Logs:&lt;/strong&gt; A Lambda function transforms raw events into a standardized format, focusing on meaningful system modifications (POST, PUT, PATCH, DELETE) and stores them in a scalable, SQL-based ClickHouse database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User Accessibility:&lt;/strong&gt; A new SST-based microservice provides a simple interface for querying and filtering logs. Tailored activity messages enhance usability, helping users quickly understand what occurred.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Technical Considerations:&lt;/strong&gt; SQS was chosen for its ability to decouple the logging process, optimize throughput, and handle scalability challenges. While other solutions like EventBridge or Kinesis were viable, SQS met our specific requirements effectively.&lt;/p&gt;

&lt;p&gt;This high-level overview provides a flexible, scalable, and user-friendly solution for audit logging while ensuring system integrity and maintaining performance.&lt;/p&gt;

&lt;p&gt;Do you want to work on features like this? Check out our &lt;a href="https://www.epilot.cloud/en/company/careers#Offene-Stellenangebote" rel="noopener noreferrer"&gt;career page&lt;/a&gt; or reach out to &lt;a href="https://x.com/boingCntributor" rel="noopener noreferrer"&gt;my Twitter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>auditlog</category>
      <category>clickhouse</category>
      <category>sqs</category>
    </item>
    <item>
      <title>How Epilot Builds a Powerful Webhook Feature with AWS</title>
      <dc:creator>Sebastian</dc:creator>
      <pubDate>Wed, 17 Jan 2024 13:31:58 +0000</pubDate>
      <link>https://forem.com/epilot/how-epilot-builds-a-powerful-webhook-feature-with-aws-4glo</link>
      <guid>https://forem.com/epilot/how-epilot-builds-a-powerful-webhook-feature-with-aws-4glo</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;At epilot, we're committed to simplify the sale and technical implementation of renewable energy solutions through a digital foundation, supporting energy suppliers, grid operators, and solution providers in the energy transition.&lt;/p&gt;

&lt;p&gt;One of our features is the integration of webhooks for data synchronization with third-party systems. This allows for timely updates and efficient data exchange, a crucial factor in enhancing our customer service in the energy sector.&lt;/p&gt;

&lt;p&gt;In this blog post, we'll dive into how we have harnessed the power of AWS to build a robust webhook feature, enhancing our service capabilities and offering our clients an even more powerful and reliable platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Evolution of Our Webhook System
&lt;/h2&gt;

&lt;p&gt;The initial version of our webhook feature was developed around the time AWS launched a new product known as &lt;a href="https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-api-destinations.html" rel="noopener noreferrer"&gt;API Destinations&lt;/a&gt;. The concept is simple but powerful: by creating an API Destination as a target for an EventBridge rule, it seamlessly forwards the request to any configured third-party service. One significant advantage of this approach is the use of EventBridge connections to secure webhook requests. This is a common challenge in many platforms, where securing requests is either unsupported or only possible through a signing secret. With EventBridge connections, securing a request becomes versatile and robust, offering options like basic authentication (username/password), API keys (e.g., Authorization: ), or OAuth – a feature frequently demanded by larger enterprise customers. This method eliminates the need for us to manually store client credentials, as the API Destination efficiently handles the signing and forwarding of the request.&lt;/p&gt;

&lt;p&gt;The following showcases a sketch of necessary components for our initial webhook architecture&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdqgh90gf32q6k5d9i3im.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdqgh90gf32q6k5d9i3im.png" alt="Initial Architecture" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The user is able to create a webhook configuration through our UI. A lambda function creates an API Destination and an EventBridge connection. It then attaches the connection to the API Destination. Then an EventBridge rule is created with API Destination as its target. Whenever this rule is matched, the target is invoked. API Destination forwards failed requests to a Dead Letter Queue (DLQ). A lambda function picks up messages from the queue and stores these events in a table to display failed events to the user.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caveats
&lt;/h3&gt;

&lt;p&gt;As our platform scaled with increased traffic and users, it unveiled unforeseen issues. The architecture we initially implemented, revealed deficiencies in areas we hadn't anticipated:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Frequent Timeouts&lt;/strong&gt;: Our customers often synchronise data generated by epilot with their systems, some of which may be slower and unable to handle requests asynchronously. A notable limitation of API Destination is its strict 5-second timeout on requests. This constraint is frequently encountered when syncing data with third-party systems, as their response times can easily exceed this duration.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Payload Size&lt;/strong&gt;: EventBridge has a hard event size limit of 256kb. While this is a substantial data allowance, we occasionally reach this limit due to extensive data usage. In serverless environments, a typical solution to circumvent such limitations is the &lt;a href="https://serverlessland.com/event-driven-architecture/visuals/claim-check-pattern" rel="noopener noreferrer"&gt;Claim-Check-Pattern&lt;/a&gt;. However, this approach is not supported by API Destination.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No Analytics&lt;/strong&gt;: Monitoring within EventBridge remains a complex issue, particularly in determining the success of requests and reflecting this in the user interface. While Dead Letter Queue (DLQ) setups enable to capture failed events, the challenge lies in effectively tracking and displaying successful events.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Was the request successful?&lt;/strong&gt;: In our platform, webhooks can be triggered by automations. An automation is a set off by predefined actions, such as triggering a webhook. We often received feedback from customers who found it confusing when webhook actions appeared to be successful but ultimately failed. Given the 'fire-and-forget' nature of webhooks, a challenge arises: How can we promptly display a failure when a request doesn't go through successfully?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No static IP support&lt;/strong&gt;: Larger enterprise customers often require the support of static IPs for using webhook features, which poses a challenge as API Destinations currently do not offer this capability.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How AWS Step Functions Fulfilled Our Requirements
&lt;/h2&gt;

&lt;p&gt;The lack of above mentioned features showcased the requirement of a new webhook architecture. &lt;br&gt;
The AWS Step Function team recently published a new &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/connect-thirdy-party-apis.html" rel="noopener noreferrer"&gt;HTTP task&lt;/a&gt;, which is very similar to API Destination. One can reuse the EventBridge connection to authorize the request and the HTTP task forwards the request to a 3rd party system. It has no CDK support yet and has to be stable for some months in order for us to adopt it. This announcement, however, brought us to the idea of using a Step Function to implement our webhook architecture. With Step Functions we can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;remove any timeout issues&lt;/li&gt;
&lt;li&gt;call them synchronously (30s API GW timeout) and asynchronously (no timeout)&lt;/li&gt;
&lt;li&gt;create a lambda task that forwards the request manually:

&lt;ul&gt;
&lt;li&gt;allows to use the Claim-Check-Pattern and send larger 
payloads&lt;/li&gt;
&lt;li&gt;can run within a VPC i.e. having a static IP is easy to 
add&lt;/li&gt;
&lt;li&gt;complete control how we fetch and store the http response&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;store all http responses in a new event table&lt;/li&gt;

&lt;li&gt;easily extend the Step Function with new features when necessary&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Playing around with the awesome Step Function builder gets us the following output:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3kzp5613x0ycxxxdic28.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3kzp5613x0ycxxxdic28.png" alt="Webhook Step Function" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The goal is so use as few lambda functions as possible, to mitigate &lt;a href="https://www.researchgate.net/figure/Cascading-cold-starts-in-AWS-Step-Functions-ASF-and-Azure-Durable-functions-ADF_fig3_347698258" rel="noopener noreferrer"&gt;cascading cold starts&lt;/a&gt;. The Step Function architecture itself is straight forward and consists the following tasks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;GetItemTask&lt;/strong&gt; Fetch the webhook configuration to know where to send the event to and how &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PutItemTask&lt;/strong&gt; Persist an event to DynamoDB with some initial data and an 'in_progress' state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LambdaInvokeTask&lt;/strong&gt; Call the 3rd party with the input of the state machine. When the input contains a s3_key, hydrate the payload first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LambdaInvokeTask&lt;/strong&gt;  Set the event to 'failed' or 'succeeded' based on the HTTP response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LambdaInvokeTask&lt;/strong&gt; (exceptions): Catch unknown exceptions, which raises alerts and sets the event to 'failed' as well.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This results in the following (high level) architecture sketch:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fom5xx1niubfh4y75w1tg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fom5xx1niubfh4y75w1tg.png" alt="High Level Architecture" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We're updating our system to use AWS Step Functions instead of API Destinations with EventBridge. This change is pretty straightforward, so we don't need any complex migration scripts. We can still use the EventBridge connections we already have, but we'll need to attach them manually to our Lambda tasks for now. We're hoping to automate this attachment by using the new above mentioned HTTP task soon.&lt;/p&gt;

&lt;p&gt;For event publishing, we're using a new API endpoint &lt;br&gt;
&lt;code&gt;/webhook/{config_id}/trigger?sync=true|false&lt;/code&gt;. The endpoint checks if the data is bigger than 256kb and, if so, stores it on S3. After that, it triggers the Step Function either in the background or synchronously. This setup is great because it means consumers don't have to worry about permissions; they just need to set up our &lt;a href="https://github.com/epilot-dev/sdk-js/tree/main/clients/webhooks-client" rel="noopener noreferrer"&gt;webhook client&lt;/a&gt;. Of course, the consumer can still use the old method of just sending an EventBridge event to trigger the webhook like before.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;API Destination proved to be an excellent service for creating a basic webhook feature, but its limitations led us to transition to AWS Step Functions. This shift has enabled us to offer our customers enhanced capabilities, including static IP support, improved analytics, handling of larger payloads, and the elimination of timeout issues. With Step Functions, we now have the flexibility to scale and evolve our architecture to meet our growing needs and those of our customers.&lt;/p&gt;

&lt;p&gt;Do you want to work on features like this? Check out &lt;a href="https://epilot.recruitee.com/" rel="noopener noreferrer"&gt;our career page&lt;/a&gt; or reach out to &lt;a href="https://twitter.com/boingCntributor" rel="noopener noreferrer"&gt;my Twitter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>webhook</category>
      <category>stepfunctions</category>
      <category>eventbridge</category>
    </item>
  </channel>
</rss>
