<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sabarish Sathasivan</title>
    <description>The latest articles on Forem by Sabarish Sathasivan (@thedeveloperjournal).</description>
    <link>https://forem.com/thedeveloperjournal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2347611%2F690ffe9f-3b11-48b6-a1f9-f6411532ed4a.png</url>
      <title>Forem: Sabarish Sathasivan</title>
      <link>https://forem.com/thedeveloperjournal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/thedeveloperjournal"/>
    <language>en</language>
    <item>
      <title>AWS Vector Databases – Part 3 : Choosing the Right Vector Database on AWS</title>
      <dc:creator>Sabarish Sathasivan</dc:creator>
      <pubDate>Mon, 30 Mar 2026 18:29:12 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-vector-databases-part-3-choosing-the-right-vector-database-on-aws-375m</link>
      <guid>https://forem.com/aws-builders/aws-vector-databases-part-3-choosing-the-right-vector-database-on-aws-375m</guid>
      <description>&lt;p&gt;This is where everything comes together.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;In case you missed it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Part 1 → &lt;a href="https://dev.to/thedeveloperjournal/aws-vector-databases-part-1-embeddings-dimensions-similarity-9ph"&gt;Embeddings, Dimensions, and Similarity Search&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 2 → &lt;a href="https://dev.to/thedeveloperjournal/aws-vector-databases-part-2-search-filtering-and-chunking-3lbe"&gt;Search Patterns, Filtering, and Chunking  &lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By now, you understand the fundamentals and how retrieval works. The real question is: &lt;strong&gt;which AWS service should you actually use?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There's no single "best" option. Each service was designed for a different primary workload and inherited vector search as a capability. That origin story shapes everything — its strengths, limitations, and the cost you'll pay.&lt;/p&gt;

&lt;p&gt;Let's break them down.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Services
&lt;/h2&gt;

&lt;h3&gt;
  
  
  OpenSearch Serverless
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A distributed search engine with native vector, keyword, and hybrid search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why choose it:&lt;/strong&gt; OpenSearch is the only AWS service that handles &lt;strong&gt;full-text search and vector search natively in one engine&lt;/strong&gt;. Its Neural Search feature automates the entire hybrid pipeline — you send a query, and it runs keyword + semantic search, then merges results using normalization and combination techniques. No manual score merging required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supports HNSW engine&lt;/li&gt;
&lt;li&gt;Distance metrics: Euclidean, Cosine, Inner Product (Dot Product)&lt;/li&gt;
&lt;li&gt;Hybrid search with Neural Search pipeline (keyword + vector, merged automatically)&lt;/li&gt;
&lt;li&gt;GPU-accelerated vector indexing (launched Dec 2025) for faster large-scale ingestion&lt;/li&gt;
&lt;li&gt;Bedrock Knowledge Bases: &lt;strong&gt;Yes&lt;/strong&gt; (most commonly used vector store for Bedrock KB)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The catch — cost:&lt;/strong&gt;&lt;br&gt;
OpenSearch Serverless bills by OCU-hours (OpenSearch Compute Units). The minimum is &lt;strong&gt;2 OCUs for production&lt;/strong&gt; (1 indexing + 1 search, each with HA redundancy) — roughly &lt;strong&gt;$350/month&lt;/strong&gt; before you store a single vector. A dev/test mode drops this to ~$174/month with 0.5 OCUs each. Vector collections also require their own dedicated OCUs — they can't share with search/time-series collections.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For small projects or prototypes, this minimum cost is the biggest friction point. But at scale, the automatic scaling and mature feature set make it the go-to for production RAG.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Scaling:&lt;/strong&gt; Automatic. OCUs scale up based on workload and scale back down. You set a maximum OCU limit to cap costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Aurora PostgreSQL (pgvector)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A relational database with the open-source pgvector extension for vector search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why choose it:&lt;/strong&gt; If your application already runs on PostgreSQL, pgvector lets you add vector search &lt;strong&gt;without introducing a new service&lt;/strong&gt;. Your vectors live alongside your relational data — same transactions, same SQL, same backups. This is powerful when your queries combine traditional filters (WHERE category = 'shoes' AND price &amp;lt; 100) with vector similarity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pgvector 0.8.0 (April 2025) brought major improvements: up to &lt;strong&gt;9x faster filtered queries&lt;/strong&gt; with iterative index scans, and significantly better recall on filtered searches&lt;/li&gt;
&lt;li&gt;Supports HNSW and IVFFlat indexing&lt;/li&gt;
&lt;li&gt;Distance metrics: Euclidean, Cosine, Inner Product&lt;/li&gt;
&lt;li&gt;Hybrid search: Manual — combine &lt;code&gt;tsvector&lt;/code&gt; (keyword) and pgvector (semantic) in a single SQL query&lt;/li&gt;
&lt;li&gt;Bedrock Knowledge Bases: &lt;strong&gt;Yes&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The catch — you own the tuning:&lt;/strong&gt;&lt;br&gt;
pgvector gives you control, but that means you're responsible for index parameter tuning (&lt;code&gt;ef_construction&lt;/code&gt;, &lt;code&gt;m&lt;/code&gt;, &lt;code&gt;ef_search&lt;/code&gt;), choosing between &lt;code&gt;relaxed_order&lt;/code&gt; and &lt;code&gt;strict_order&lt;/code&gt; for iterative scans, and managing the trade-off between recall and latency. It's not "plug and play" like OpenSearch Neural Search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling:&lt;/strong&gt; Aurora Serverless v2 scales compute in fine-grained ACU increments. Read replicas handle query scale-out. I/O-Optimized configuration helps with cost predictability for vector workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Amazon S3 Vectors
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; The first cloud object store with native vector support. Purpose-built for storing and querying vectors at massive scale and minimal cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why choose it:&lt;/strong&gt; When cost is the primary concern and you don't need millisecond latencies. S3 Vectors can reduce the total cost of storing and querying vectors by &lt;strong&gt;up to 90%&lt;/strong&gt; compared to traditional vector databases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Up to &lt;strong&gt;2 billion vectors per index&lt;/strong&gt;, up to 10,000 indexes per vector bucket&lt;/li&gt;
&lt;li&gt;Distance metrics: &lt;strong&gt;Cosine and Euclidean only&lt;/strong&gt; (Inner Product not supported)&lt;/li&gt;
&lt;li&gt;Metadata filtering applied during the vector search itself (not purely pre- or post-filter)&lt;/li&gt;
&lt;li&gt;Fully serverless — no infrastructure to provision or manage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Does not support hybrid search&lt;/strong&gt; (semantic search only)&lt;/li&gt;
&lt;li&gt;Bedrock Knowledge Bases: &lt;strong&gt;Yes&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The catch — latency and throughput:&lt;/strong&gt;&lt;br&gt;
S3 Vectors is designed for infrequent-to-moderate query patterns. Infrequent queries return in under 1 second; more frequent queries get down to ~100ms. Write throughput caps at ~2,500 vectors/second per index, and query throughput is in the hundreds of requests/second per index. This is not the right choice for real-time, high-QPS applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost example:&lt;/strong&gt;&lt;br&gt;
For 250K vectors across 40 indexes with 1M queries/month: approximately &lt;strong&gt;$11/month&lt;/strong&gt;. Compare that to OpenSearch Serverless's $350/month minimum.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling:&lt;/strong&gt; Fully elastic. No capacity planning required. Costs scale linearly with storage and queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Amazon MemoryDB
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A Redis-compatible, durable, in-memory database with vector search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why choose it:&lt;/strong&gt; When you need &lt;strong&gt;single-digit millisecond&lt;/strong&gt; vector search latency with strong durability. MemoryDB keeps both the vectors and the HNSW index in memory, which is why it's the fastest vector search option on AWS — supporting tens of thousands of queries/second at &amp;gt;99% recall.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supports FLAT (exact KNN) and HNSW indexing&lt;/li&gt;
&lt;li&gt;Distance metrics: Euclidean, Cosine, Inner Product&lt;/li&gt;
&lt;li&gt;Single-digit millisecond query and update latency&lt;/li&gt;
&lt;li&gt;Multi-AZ durability (unlike typical in-memory caches)&lt;/li&gt;
&lt;li&gt;Uses &lt;code&gt;FT.SEARCH&lt;/code&gt; and &lt;code&gt;FT.AGGREGATE&lt;/code&gt; commands for vector queries&lt;/li&gt;
&lt;li&gt;Bedrock Knowledge Bases: &lt;strong&gt;No&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The catch — single shard and RAM cost:&lt;/strong&gt;&lt;br&gt;
Vector search is &lt;strong&gt;limited to a single shard&lt;/strong&gt; — no horizontal scaling for vectors. You can scale vertically (bigger instances) and add read replicas, but your total vector dataset must fit in the memory of one node. For a 10M vector dataset with 1024 dimensions, you might need a &lt;code&gt;db.r7g.4xlarge&lt;/code&gt; (~105 GB usable memory). RAM is expensive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Real-time RAG where freshness matters (index updates propagate in milliseconds), fraud detection, and real-time recommendation engines where every millisecond counts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling:&lt;/strong&gt; Vertical (bigger nodes) + read replicas for query throughput. No horizontal shard scaling for vector workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Amazon ElastiCache (Valkey)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A managed Valkey (open-source Redis fork) service with vector search, optimized for caching and ephemeral workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why choose it:&lt;/strong&gt; Valkey is purpose-built for &lt;strong&gt;semantic caching&lt;/strong&gt; and &lt;strong&gt;agent memory&lt;/strong&gt;. If you're building agentic AI systems and need to cache LLM responses, store conversational memory, or implement fast vector lookups in the hot path of every request — this is the service.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supports HNSW and FLAT indexing&lt;/li&gt;
&lt;li&gt;Distance metrics: Euclidean, Cosine, Inner Product&lt;/li&gt;
&lt;li&gt;Microsecond-level latency for cached data&lt;/li&gt;
&lt;li&gt;Integrates with &lt;strong&gt;LangGraph and mem0&lt;/strong&gt; for agent memory layers&lt;/li&gt;
&lt;li&gt;Compatible with Amazon Bedrock AgentCore Runtime&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Horizontal scaling supported&lt;/strong&gt; — adding shards gives linear improvement in ingestion and recall&lt;/li&gt;
&lt;li&gt;Bedrock Knowledge Bases: &lt;strong&gt;No&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How it differs from MemoryDB:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ElastiCache Valkey supports &lt;strong&gt;multi-shard horizontal scaling&lt;/strong&gt; for vectors (MemoryDB is single-shard only)&lt;/li&gt;
&lt;li&gt;MemoryDB provides &lt;strong&gt;Multi-AZ durability&lt;/strong&gt; (writes acknowledged only after replication); Valkey is designed more as a cache layer — it's durable but not to the same degree&lt;/li&gt;
&lt;li&gt;Valkey includes mature cache primitives (TTLs, eviction policies, atomic operations) that make it natural for caching use cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Semantic caching to reduce LLM costs, short-term and long-term agent memory via mem0/LangGraph, and any use case where vectors are in the hot path of a latency-sensitive request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling:&lt;/strong&gt; Vertical, horizontal (multi-shard), and replica-based. Most flexible scaling model among the in-memory options.&lt;/p&gt;

&lt;h3&gt;
  
  
  Amazon Neptune Analytics
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A graph analytics engine that also supports vector search, designed to combine graph traversals with vector similarity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why choose it:&lt;/strong&gt; When your data has &lt;strong&gt;explicit relationships&lt;/strong&gt; and you want to combine graph-based reasoning with semantic search. Neptune Analytics powers &lt;strong&gt;GraphRAG&lt;/strong&gt; in Bedrock Knowledge Bases — it automatically extracts entities, facts, and relationships from your documents and stores them as a graph, then combines vector search with graph traversal for more comprehensive, cross-document answers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stores embeddings directly on graph nodes&lt;/li&gt;
&lt;li&gt;Combines vector similarity search with graph algorithms (PageRank, shortest path, etc.)&lt;/li&gt;
&lt;li&gt;Supports openCypher query language&lt;/li&gt;
&lt;li&gt;GraphRAG integration with Bedrock Knowledge Bases — auto-builds knowledge graphs from your documents&lt;/li&gt;
&lt;li&gt;Bedrock Knowledge Bases: &lt;strong&gt;Yes&lt;/strong&gt; (GraphRAG)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The catch:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pricing is based on memory-optimized compute units (m-NCU), billed per hour&lt;/li&gt;
&lt;li&gt;Autoscaling is &lt;strong&gt;not supported&lt;/strong&gt; — you choose your graph capacity upfront&lt;/li&gt;
&lt;li&gt;You can pause graphs when not in use (pay 10% of compute cost while paused)&lt;/li&gt;
&lt;li&gt;Best suited for analytical / batch workloads rather than high-QPS online serving&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Knowledge graphs for compliance/regulatory data, entity-relationship analysis combined with semantic search, and use cases where understanding connections between documents matters more than raw search speed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling:&lt;/strong&gt; Provisioned (memory-optimized). Choose capacity at creation. No autoscaling.&lt;/p&gt;




&lt;h3&gt;
  
  
  Amazon DocumentDB
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A MongoDB-compatible document database with vector search support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why choose it:&lt;/strong&gt; If you're already on DocumentDB (or have a MongoDB-based application) and want to add vector search without a new service. Similar logic to Aurora pgvector — keep vectors alongside your document data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Available on DocumentDB 5.0+ instance-based clusters&lt;/li&gt;
&lt;li&gt;Supports HNSW and IVFFlat indexing&lt;/li&gt;
&lt;li&gt;Distance metrics: Euclidean, Cosine, Dot Product&lt;/li&gt;
&lt;li&gt;Up to 2,000 dimensions with an index (16,000 without index)&lt;/li&gt;
&lt;li&gt;Bedrock Knowledge Bases: &lt;strong&gt;No&lt;/strong&gt; (not a supported Bedrock KB vector store)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The catch:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instance-based scaling (no serverless option for vector workloads)&lt;/li&gt;
&lt;li&gt;I/O costs can add up significantly&lt;/li&gt;
&lt;li&gt;Smaller vector ecosystem and less community tooling compared to pgvector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Existing DocumentDB/MongoDB workloads that need to add semantic search alongside existing JSON document queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling:&lt;/strong&gt; Instance-based. Vertical scaling + read replicas.&lt;/p&gt;

&lt;h3&gt;
  
  
  Amazon Kendra (GenAI Index)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; A fully managed enterprise search service. Not a vector database — it's an end-to-end search solution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why choose it:&lt;/strong&gt; When you need &lt;strong&gt;enterprise search with built-in connectors&lt;/strong&gt; to 43+ data sources (SharePoint, Salesforce, Google Drive, Confluence, etc.) and don't want to build a RAG pipeline from scratch. The GenAI Index uses hybrid search (vector + keyword) with pre-optimized parameters — no tuning required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;43+ built-in data source connectors with metadata and permission filtering&lt;/li&gt;
&lt;li&gt;Hybrid index combining vector and keyword search, pre-optimized&lt;/li&gt;
&lt;li&gt;Integrates with both &lt;strong&gt;Bedrock Knowledge Bases&lt;/strong&gt; and &lt;strong&gt;Amazon Q Business&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A single Kendra GenAI Index can serve multiple Q Business apps and Bedrock KBs&lt;/li&gt;
&lt;li&gt;Bedrock Knowledge Bases: &lt;strong&gt;Yes&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The catch — pricing:&lt;/strong&gt;&lt;br&gt;
Kendra is expensive for what it offers. The GenAI Enterprise Edition base index starts at &lt;strong&gt;$0.32/hour&lt;/strong&gt; (~$230/month) for up to 20,000 documents. The Basic Enterprise Edition is &lt;strong&gt;$1.40/hour&lt;/strong&gt; (~$1,008/month). This is enterprise pricing — it makes sense when you're connecting to many data sources and need managed connectors, permissions, and relevance tuning out of the box.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Enterprise search across dozens of data sources where you need managed connectors, user-level access control, and a fully managed experience. Not for custom RAG pipelines where you want control over chunking, embeddings, and retrieval logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling:&lt;/strong&gt; Fully managed. Add storage units and query units as needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision Matrix
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Recommended&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;th&gt;Alternative&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;General-purpose RAG&lt;/td&gt;
&lt;td&gt;OpenSearch Serverless&lt;/td&gt;
&lt;td&gt;Native hybrid search, most mature Bedrock KB integration&lt;/td&gt;
&lt;td&gt;Aurora pgvector&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Already on PostgreSQL&lt;/td&gt;
&lt;td&gt;Aurora pgvector&lt;/td&gt;
&lt;td&gt;Add vectors without a new service, SQL + vector in one query&lt;/td&gt;
&lt;td&gt;OpenSearch Serverless&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost-sensitive / massive scale&lt;/td&gt;
&lt;td&gt;S3 Vectors&lt;/td&gt;
&lt;td&gt;90% cheaper, 2B vectors/index, fully serverless&lt;/td&gt;
&lt;td&gt;OpenSearch Serverless&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ultra-low latency (real-time)&lt;/td&gt;
&lt;td&gt;MemoryDB&lt;/td&gt;
&lt;td&gt;Single-digit ms queries, &amp;gt;99% recall, durable&lt;/td&gt;
&lt;td&gt;ElastiCache Valkey&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic caching / reduce LLM cost&lt;/td&gt;
&lt;td&gt;ElastiCache Valkey&lt;/td&gt;
&lt;td&gt;Cache primitives + vector search, microsecond latency&lt;/td&gt;
&lt;td&gt;MemoryDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent memory (short-term + long-term)&lt;/td&gt;
&lt;td&gt;ElastiCache Valkey&lt;/td&gt;
&lt;td&gt;LangGraph/mem0 integration, horizontal scaling&lt;/td&gt;
&lt;td&gt;MemoryDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge graph + vectors&lt;/td&gt;
&lt;td&gt;Neptune Analytics&lt;/td&gt;
&lt;td&gt;GraphRAG, entity-relationship reasoning&lt;/td&gt;
&lt;td&gt;OpenSearch Serverless&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise search (managed)&lt;/td&gt;
&lt;td&gt;Kendra GenAI Index&lt;/td&gt;
&lt;td&gt;43+ connectors, permissions, zero tuning&lt;/td&gt;
&lt;td&gt;Bedrock KB + S3 Vectors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Already on MongoDB/DocumentDB&lt;/td&gt;
&lt;td&gt;DocumentDB&lt;/td&gt;
&lt;td&gt;Add vectors alongside existing JSON data&lt;/td&gt;
&lt;td&gt;Aurora pgvector&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Cost Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Pricing Model&lt;/th&gt;
&lt;th&gt;Minimum Monthly Cost&lt;/th&gt;
&lt;th&gt;Best Cost Scenario&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;S3 Vectors&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Storage + PUT + queries&lt;/td&gt;
&lt;td&gt;~$11 (250K vectors, 1M queries)&lt;/td&gt;
&lt;td&gt;Cheapest at any scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Aurora pgvector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Instance hours + storage + I/O&lt;/td&gt;
&lt;td&gt;~$50+ (Serverless v2 min)&lt;/td&gt;
&lt;td&gt;Cheap if DB already exists&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenSearch Serverless&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OCU-hours + S3 storage&lt;/td&gt;
&lt;td&gt;~$174 (dev/test), ~$350 (prod)&lt;/td&gt;
&lt;td&gt;Good at scale, painful for prototypes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DocumentDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Instance hours + I/O&lt;/td&gt;
&lt;td&gt;~$100+&lt;/td&gt;
&lt;td&gt;Reasonable if already on DocumentDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MemoryDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Node hours (in-memory)&lt;/td&gt;
&lt;td&gt;~$200+ (r7g.large)&lt;/td&gt;
&lt;td&gt;Expensive — RAM is the bottleneck&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ElastiCache Valkey&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Node hours (in-memory)&lt;/td&gt;
&lt;td&gt;~$100+ (r7g.large)&lt;/td&gt;
&lt;td&gt;Similar to MemoryDB, scales better&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Neptune Analytics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;m-NCU hours&lt;/td&gt;
&lt;td&gt;Varies by graph size&lt;/td&gt;
&lt;td&gt;Pausable (10% cost when idle)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kendra GenAI Index&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hourly base + units&lt;/td&gt;
&lt;td&gt;~$230 (GenAI), ~$1,008 (Enterprise)&lt;/td&gt;
&lt;td&gt;Enterprise pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When NOT to Use a Vector Database
&lt;/h2&gt;

&lt;p&gt;Before building anything, ask yourself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small dataset (&amp;lt;10K items)?&lt;/strong&gt; → Use in-memory search (FAISS, numpy) with no infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exact match queries only?&lt;/strong&gt; → Use a traditional database or search index&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured filtering only?&lt;/strong&gt; → A regular database with indexes will be faster and cheaper&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static FAQ or lookup table?&lt;/strong&gt; → Don't overcomplicate it. A simple key-value store works&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time transactional workload?&lt;/strong&gt; → Use a relational or NoSQL database; vector search is a read-optimized pattern&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Near-zero budget?&lt;/strong&gt; → Use FAISS locally or with S3 for persistence&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;That's a wrap on the series. If you're building with RAG or semantic search on AWS, you now have both the conceptual foundation and practical guidance to choose the right architecture.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Missed the earlier parts?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Part 1 → &lt;a href="https://dev.to/thedeveloperjournal/aws-vector-databases-part-1-embeddings-dimensions-similarity-9ph"&gt;Embeddings, Dimensions, and Similarity Search&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 2 → &lt;a href="https://dev.to/thedeveloperjournal/aws-vector-databases-part-2-search-filtering-and-chunking-3lbe"&gt;Search Patterns, Filtering, and Chunking&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>vectordatabase</category>
      <category>rag</category>
      <category>genai</category>
    </item>
    <item>
      <title>AWS Vector Databases – Part 2: Search, Filtering, and Chunking</title>
      <dc:creator>Sabarish Sathasivan</dc:creator>
      <pubDate>Mon, 30 Mar 2026 18:29:03 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-vector-databases-part-2-search-filtering-and-chunking-3lbe</link>
      <guid>https://forem.com/aws-builders/aws-vector-databases-part-2-search-filtering-and-chunking-3lbe</guid>
      <description>&lt;p&gt;This is Part 2 of the AWS vector database series.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Missed Part 1?&lt;/strong&gt; Start here: &lt;em&gt;&lt;a href="https://dev.to/thedeveloperjournal/aws-vector-databases-part-1-embeddings-dimensions-similarity-9ph"&gt;Embeddings, Dimensions, and Similarity Search&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In Part 1, we covered the fundamentals of embeddings and how similarity is measured. Now we move into how retrieval actually works in practice.&lt;/p&gt;

&lt;p&gt;In this part, we’ll look at search patterns (KNN vs ANN), hybrid search, metadata filtering, and chunking strategies — the building blocks of effective RAG systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vector Search Types
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;How it works&lt;/th&gt;
&lt;th&gt;When to use&lt;/th&gt;
&lt;th&gt;AWS services&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;KNN — Exact Nearest Neighbor Search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Check every single item, compare it to your query, return the best matches. Perfectly accurate, but slow.&lt;/td&gt;
&lt;td&gt;Small datasets (under 100K vectors) or situations where you absolutely cannot afford to miss a result — like medical diagnostics or legal compliance checks.&lt;/td&gt;
&lt;td&gt;All vector services support KNN as a fallback, but it's not practical at scale.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ANN — Approximate Nearest Neighbor Search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Uses a smart index structure (graph or cluster) to find &lt;em&gt;very likely&lt;/em&gt; nearest neighbors without checking everything&lt;/td&gt;
&lt;td&gt;Almost everything in production. If you're building a RAG chatbot, semantic search, or recommendation engine, this is your default.&lt;/td&gt;
&lt;td&gt;OpenSearch Serverless, Aurora pgvector, MemoryDB, ElastiCache Valkey, DocumentDB, S3 Vectors, Neptune Analytics.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  ANN Index Structures
&lt;/h3&gt;

&lt;p&gt;To avoid checking every vector, ANN uses smart indexing. The two most common types on AWS are:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Index&lt;/th&gt;
&lt;th&gt;Simple idea&lt;/th&gt;
&lt;th&gt;Trade-off&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HNSW&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Connects similar vectors like a network and “walks” through it to find matches&lt;/td&gt;
&lt;td&gt;Uses more memory and takes longer to build, but gives faster and more accurate results. &lt;strong&gt;Default in most AWS services.&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IVFFlat&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Groups vectors into clusters and only searches the closest groups&lt;/td&gt;
&lt;td&gt;Faster to build and uses less memory, but needs tuning and may miss some results&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Intuitive way to think about it
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;HNSW — like navigating a city with highways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start with &lt;strong&gt;highways&lt;/strong&gt; to get close
&lt;/li&gt;
&lt;li&gt;Then use &lt;strong&gt;local roads&lt;/strong&gt; to find the exact place
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;HNSW does the same:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Moves from broad → detailed search
&lt;/li&gt;
&lt;li&gt;Finds results quickly and accurately
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;IVFFlat — like searching in neighborhoods&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First pick a few &lt;strong&gt;likely neighborhoods&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Then search inside them
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;IVFFlat works similarly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduces search space
&lt;/li&gt;
&lt;li&gt;But can miss results if the right cluster isn’t picked
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Which one should you use?
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Go with &lt;strong&gt;HNSW&lt;/strong&gt; → best performance and accuracy (default choice)
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;IVFFlat&lt;/strong&gt; → faster to build, lower memory, but slightly less accurate
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Hybrid Search
&lt;/h2&gt;

&lt;p&gt;Hybrid search runs two searches at the same time—one that understands meaning (vector search) and one that looks for exact words (keyword search)—and then combines the results.&lt;/p&gt;

&lt;p&gt;For example, a user might search: “lambda timeout issue nodejs.” &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;Vector Search&lt;/strong&gt; understands the intent (performance/debugging)&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Keyword Search&lt;/strong&gt; ensures exact terms like &lt;em&gt;lambda&lt;/em&gt; and &lt;em&gt;nodejs&lt;/em&gt; are matched.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;  The scoring method used to combine these two result sets is called &lt;strong&gt;Reciprocal Rank Fusion (RRF&lt;/strong&gt;). It doesn’t simply add scores—it prioritizes documents that rank highly in both searches. For example, if a document ranks #1 in keyword search and #2 in vector search, RRF will push it to the top of the final results.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is especially useful for enterprise RAG. Users rarely search with purely natural language or purely exact keywords—they usually mix both.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenSearch Serverless&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supports Native (RRF). The most robust option; its "Neural Search" feature handles the hybrid merging automatically.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Aurora pgvector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;This is sql based and best for relational data; you manually combine &lt;code&gt;tsvector&lt;/code&gt; (keywords) and &lt;code&gt;vector&lt;/code&gt; (meaning) in one query.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Metadata Filtering
&lt;/h2&gt;

&lt;p&gt;Metadata filtering narrows down results using structured data like date, category, or user ID—before or after the vector search runs.&lt;/p&gt;

&lt;p&gt;Think of it like this: a vector search finds books similar to &lt;em&gt;Harry Potter&lt;/em&gt;. But you only want books published after 2010 and available in English. Metadata filtering ensures you don’t waste time on the wrong results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pre-filtering vs Post-filtering
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;How it works&lt;/th&gt;
&lt;th&gt;Trade-offs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pre-filtering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Applies filters first, then runs vector search on the remaining data&lt;/td&gt;
&lt;td&gt;Accurate and secure, but can be slower depending on the engine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Post-filtering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runs vector search first, then filters the results&lt;/td&gt;
&lt;td&gt;Fast, but may return zero results if none match the filters&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;  &lt;strong&gt;S3 Vectors&lt;/strong&gt; applies metadata filters during the vector search itself, combining the accuracy of pre-filtering with the performance of post-filtering.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Chunking
&lt;/h2&gt;

&lt;p&gt;Chunking is simply breaking a long document into smaller, meaningful pieces before creating embeddings. If your chunks are &lt;strong&gt;too small&lt;/strong&gt;, you lose context. If they’re &lt;strong&gt;too big&lt;/strong&gt;, the important meaning gets buried in noise. The goal is to find the right balance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Chunking Strategies
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;How it works&lt;/th&gt;
&lt;th&gt;Chunk size&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fixed-size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Split every &lt;em&gt;N&lt;/em&gt; tokens/characters with optional overlap&lt;/td&gt;
&lt;td&gt;256–512 tokens&lt;/td&gt;
&lt;td&gt;Simple content like logs or short descriptions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recursive&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Split by paragraphs → sentences → words while preserving structure&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;td&gt;General-purpose text (default in most tools)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Semantic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Use an embedding model to split based on topic boundaries&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;td&gt;Long-form content like whitepapers or legal docs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Document-structure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Split using headings, sections, or document layout&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;td&gt;Structured docs like manuals, HTML, or code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sentence-window&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Store sentences, return surrounding context at query time&lt;/td&gt;
&lt;td&gt;1 sentence (store) / window (return)&lt;/td&gt;
&lt;td&gt;High-precision Q&amp;amp;A&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Bedrock Chunking Options
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bedrock option&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Equivalent concept&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Default&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~300-token chunks that respect sentence boundaries&lt;/td&gt;
&lt;td&gt;Recursive (baseline)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fixed-size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You control chunk size and overlap&lt;/td&gt;
&lt;td&gt;Fixed-size&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hierarchical&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Searches small chunks but returns larger context&lt;/td&gt;
&lt;td&gt;Sentence-window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Semantic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Splits based on topic boundaries&lt;/td&gt;
&lt;td&gt;Semantic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;None&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No splitting — entire file treated as one chunk&lt;/td&gt;
&lt;td&gt;Document-structure (manual)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;👉 &lt;strong&gt;Continue reading:&lt;/strong&gt; In &lt;a href="https://dev.to/thedeveloperjournal/aws-vector-databases-part-3-choosing-the-right-vector-database-on-aws-375m"&gt;Part 3&lt;/a&gt;, we’ll compare AWS vector database options and build a practical decision framework to help you choose the right one.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>vectordatabase</category>
      <category>rag</category>
      <category>ai</category>
    </item>
    <item>
      <title>AWS Vector Databases Part 1: Embeddings, Dimensions &amp; Similarity</title>
      <dc:creator>Sabarish Sathasivan</dc:creator>
      <pubDate>Mon, 30 Mar 2026 18:28:44 +0000</pubDate>
      <link>https://forem.com/aws-builders/aws-vector-databases-part-1-embeddings-dimensions-similarity-9ph</link>
      <guid>https://forem.com/aws-builders/aws-vector-databases-part-1-embeddings-dimensions-similarity-9ph</guid>
      <description>&lt;p&gt;This is Part 1 of a series exploring vector databases on AWS.&lt;/p&gt;

&lt;p&gt;We recently evaluated multiple AWS vector database options to understand their trade-offs, performance characteristics, and real-world use cases. Before comparing services, it’s important to understand the core concepts that power vector search.&lt;/p&gt;

&lt;p&gt;In this part, we’ll cover embeddings, dimensions, and similarity search — the foundation of every RAG and semantic search system.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Embeddings?
&lt;/h2&gt;

&lt;p&gt;Let’s say you're building a customer support chatbot.&lt;/p&gt;

&lt;p&gt;A user asks: &lt;strong&gt;“How do I change my login info?”&lt;/strong&gt;&lt;br&gt;
Your FAQ has: &lt;strong&gt;“Resetting your password.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A keyword search might miss this. But as humans, we know they mean the same thing. That’s the gap embeddings solve.&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;embedding&lt;/strong&gt; is a numerical representation of content (text, image, code) where similar meaning leads to similar numbers. So even if the words differ, the intent stays close.&lt;/p&gt;
&lt;h3&gt;
  
  
  How Embeddings Are Created
&lt;/h3&gt;

&lt;p&gt;Here's what happens under the hood when you pass a sentence to an embedding model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"How do I reset my password?"
        │
        ▼
   Tokenization         →  ["How", "do", "I", "reset", "my", "password", "?"]
        │
        ▼
   Embedding Model       →  Neural network (e.g., Titan v2)
        │
        ▼
   Vector Output         →  [0.021, -0.438, 0.712, ..., 0.155] (1,024 floats)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part isn’t the numbers themselves—it’s that similar sentences produce vectors that are close to each other. &lt;/p&gt;

&lt;p&gt;On AWS, you can generate embeddings using models like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Titan Text Embeddings V2&lt;/li&gt;
&lt;li&gt;Titan Embeddings G1 - Text&lt;/li&gt;
&lt;li&gt;Cohere Embed English v3&lt;/li&gt;
&lt;li&gt;Cohere Embed Multilingual v3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're getting started, &lt;strong&gt;Amazon Titan Embeddings V2&lt;/strong&gt; is a solid default—simple, cost-effective, and good enough for most use cases.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The model you use for &lt;strong&gt;ingestion&lt;/strong&gt; (storing data) must be the exact same model you use for &lt;strong&gt;inference&lt;/strong&gt; (querying). If you embed your database using Amazon Titan but try to query it using an OpenAI embedding, the math won't align, and your search results will be complete gibberish.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Dimensions
&lt;/h2&gt;

&lt;p&gt;So far, we’ve seen that embeddings are just lists of numbers representing meaning. The next question is: how many numbers are in that list? That’s where &lt;strong&gt;dimensions&lt;/strong&gt; come in.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;dimension&lt;/strong&gt; is simply the number of values in an embedding list (vector). Different models produce different dimensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cohere Embed English v3, Cohere Embed Multilingual v3 → 1,024&lt;/li&gt;
&lt;li&gt;Amazon Titan Embeddings → 1,024 (default), 512, 256&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Historically, more dimensions meant better accuracy but higher storage costs and slower searches. However, the game changed with Amazon Titan Text Embeddings V2.Titan v2 supports "flexible" dimensions. You can generate a 1024-dimension vector and "truncate" it down to 512 or 256.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1024 Dimensions: Maximum "nuance" and accuracy.&lt;/li&gt;
&lt;li&gt;256 Dimensions: Up to 4x less storage cost and faster search speeds, with only a marginal hit to accuracy.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Distance Metrics: Measure Similarity
&lt;/h2&gt;

&lt;p&gt;Once you have thousands of embeddings stored, the database uses a distance metric to find the "nearest neighbors" to your query.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Logic&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cosine&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Angle between vectors&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;The Standard.&lt;/strong&gt; Best for text and RAG. Ignores document length.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Euclidean (L2)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Straight-line distance&lt;/td&gt;
&lt;td&gt;Best for &lt;strong&gt;Images&lt;/strong&gt; or fixed-size data where magnitude matters.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inner Product&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Direction + Magnitude&lt;/td&gt;
&lt;td&gt;Best for &lt;strong&gt;Recommendations&lt;/strong&gt; where popularity or "weight" matters.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why Metrics Matter: The "Wordiness" Problem
&lt;/h3&gt;

&lt;p&gt;Let’s compare two vectors representing the same topic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vector A:&lt;/strong&gt; &lt;code&gt;[1, 2, 3]&lt;/code&gt; (A short, concise help article)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector B:&lt;/strong&gt; &lt;code&gt;[10, 20, 30]&lt;/code&gt; (A very long, detailed whitepaper on the same topic)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even though they cover the exact same intent, different metrics interpret them wildly differently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cosine Similarity (The Compass):&lt;/strong&gt; Sees that both arrows point at the exact same target in space. It gives them a &lt;strong&gt;perfect match score&lt;/strong&gt;. This is why it’s the standard for RAG—you want your short question to match a long document.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Euclidean Distance (The Ruler):&lt;/strong&gt; Measures the physical distance between the "tips" of the arrows. Because Vector B is so much longer, the ruler sees them as &lt;strong&gt;miles apart&lt;/strong&gt; and may treat them as unrelated.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inner Product (The Spotlight):&lt;/strong&gt; It sees that both point the same way, but it gives Vector B a "higher" score because it is stronger/longer. This is perfect for recommendation engines where you want to highlight "heavy-hitting" content.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Query A:&lt;/strong&gt; "How do I reset my password?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Doc B:&lt;/strong&gt; "A guide to password resets for new users."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They mean the same thing, but Doc B is 1,000 words long. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cosine&lt;/strong&gt; correctly identifies them as a match because it ignores the extra "fluff" and focuses on the intent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Euclidean&lt;/strong&gt; might fail because the sheer volume of words in Doc B pushes its vector too far away from the short query.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaway:&lt;/strong&gt; For 95% of AWS text-based applications (Chatbots, Q&amp;amp;A, Knowledge Bases), &lt;strong&gt;use Cosine Similarity&lt;/strong&gt;. It is the default in Aurora pgvector, OpenSearch, and S3 Vectors for a reason: it focuses on &lt;em&gt;meaning&lt;/em&gt; over &lt;em&gt;length&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;👉 &lt;strong&gt;Continue reading:&lt;/strong&gt; In &lt;a href="https://dev.to/thedeveloperjournal/aws-vector-databases-part-2-search-filtering-and-chunking-3lbe"&gt;Part 2&lt;/a&gt;, we’ll explore vector search patterns (KNN vs ANN), hybrid search, metadata filtering, and chunking — and how they impact real-world systems.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>vectordatabase</category>
      <category>rag</category>
      <category>genai</category>
    </item>
    <item>
      <title>AWS Serverless Payload Limits Expand to 1 MB: What It Means for Event-Driven Architectures</title>
      <dc:creator>Sabarish Sathasivan</dc:creator>
      <pubDate>Fri, 30 Jan 2026 16:43:23 +0000</pubDate>
      <link>https://forem.com/thedeveloperjournal/aws-serverless-payload-limits-expand-to-1-mb-what-it-means-for-event-driven-architectures-2j99</link>
      <guid>https://forem.com/thedeveloperjournal/aws-serverless-payload-limits-expand-to-1-mb-what-it-means-for-event-driven-architectures-2j99</guid>
      <description>&lt;p&gt;AWS has been steadily increasing the maximum payload size from &lt;strong&gt;256 KB to 1 MB&lt;/strong&gt; across key serverless services. This is a meaningful improvement for event-driven architectures that rely on richer event data and reduced fragmentation.&lt;/p&gt;

&lt;p&gt;Modern cloud applications no longer pass around small strings or simple messages. LLM prompts, telemetry signals, personalization context, ML outputs, and user interaction data are often nested JSON objects carrying real state and meaning. Until recently, fitting this data into serverless workflows required careful design tradeoffs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recent Announcements
&lt;/h2&gt;

&lt;p&gt;AWS first introduced the &lt;strong&gt;1 MB&lt;/strong&gt; payload limit for individual services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Amazon SQS&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/08/amazon-sqs-max-payload-size-1mib/" rel="noopener noreferrer"&gt;Amazon SQS increases maximum message payload size to 1 MiB&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AWS Lambda (Asynchronous Invocations)&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/10/aws-lambda-payload-size-256-kb-1-mb-invocations/" rel="noopener noreferrer"&gt;AWS Lambda increases maximum payload size from 256 KB to 1 MB for asynchronous invocations&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  1 MB Limit for Event-Driven Lambda
&lt;/h2&gt;

&lt;p&gt;With the announcement - &lt;a href="https://aws.amazon.com/blogs/compute/more-room-to-build-serverless-services-now-support-payloads-up-to-1-mb/" rel="noopener noreferrer"&gt;More room to build: serverless services now support payloads up to 1 MB&lt;/a&gt; on January 29, 2026, AWS confirmed that the 1 MB payload size limit now applies consistently to asynchronous Lambda invocations originating from Amazon SQS and Amazon Event Bus in Amazon EventBridge also. &lt;/p&gt;

&lt;p&gt;This removes the need for several common architectural workarounds, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chunking large payloads into multiple events&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S3 Claim Check Pattern&lt;/strong&gt; - Storing payloads in Amazon S3 and passing object references in the event&lt;/li&gt;
&lt;li&gt;Compressing data to fit within size limits&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Considerations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lambda memory and performance&lt;/strong&gt; &lt;br&gt;
Since parsing larger JSON payloads can increase both memory consumption and execution time, especially in high-throughput event-driven workloads.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lambda timeouts still apply&lt;/strong&gt;&lt;br&gt;
The payload size may now be 1 MB, but Lambda timeouts (up to 15 minutes) and memory limits remain unchanged and should factor into your design decisions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;S3 Claim Check Pattern still matter&lt;/strong&gt;&lt;br&gt;
Storing large payloads in Amazon S3 and passing lightweight references through events remains a good choice when data exceeds size limits, is shared across consumers, or requires strong governance and traceability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Messaging Cost (SQS &amp;amp; EventBridge)&lt;/strong&gt; &lt;br&gt;
While payload limits have increased, billing is still based on &lt;strong&gt;64 KB chunks&lt;/strong&gt;. Both Amazon SQS and Amazon EventBridge meter usage per 64 KB unit. As a result, a single &lt;strong&gt;1 MB (1,024 KB)&lt;/strong&gt; event is billed as 16 requests, not one. In high-volume systems, this can significantly increase messaging costs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Compute Cost (Lambda Async)&lt;/strong&gt; &lt;br&gt;
For async Lambda invocations, the first &lt;strong&gt;256 KB&lt;/strong&gt; is billed as one request, with an additional request charged for every &lt;strong&gt;64 KB&lt;/strong&gt; beyond that. This means a &lt;strong&gt;1 MB (1024 KB) async event is billed as 13 request&lt;/strong&gt;s (1 base + 12 additional chunks). At scale, these hidden request multipliers can quickly erode your compute budget.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This table provides a quick reference for calculating the real cost of your payloads.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Payload Size&lt;/th&gt;
&lt;th&gt;Billable Requests (SQS / EventBridge)&lt;/th&gt;
&lt;th&gt;Billable Requests (Lambda Async)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Up to 64 KB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;256 KB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4 (4 *64 KB)&lt;/td&gt;
&lt;td&gt;1 (1*256 KB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;512 KB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8 (8 * 64 KB)&lt;/td&gt;
&lt;td&gt;5 (1 * 256 KB + 4 * 64 KB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1 MB (1024 KB)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;16 (16 * 64 KB)&lt;/td&gt;
&lt;td&gt;13 (1 * 256 KB + 12 * 64 KB)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>eventdriven</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Beyond the 29-Second Limit: 4 Patterns for Serverless GenAI on AWS</title>
      <dc:creator>Sabarish Sathasivan</dc:creator>
      <pubDate>Thu, 15 Jan 2026 15:50:42 +0000</pubDate>
      <link>https://forem.com/thedeveloperjournal/beating-the-29-second-timeout-practical-serverless-patterns-for-genai-apis-on-aws-2j8a</link>
      <guid>https://forem.com/thedeveloperjournal/beating-the-29-second-timeout-practical-serverless-patterns-for-genai-apis-on-aws-2j8a</guid>
      <description>&lt;p&gt;When teams start building GenAI-powered APIs on AWS, the initial architecture often looks straightforward:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dkiv38fkit5o2b6uwmr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dkiv38fkit5o2b6uwmr.png" alt="Serverless GEN API"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It works well for demos and early prototypes. But as soon as prompts grow larger, models get heavier, or agent-style workflows are introduced, many teams hit the same invisible wall: &lt;strong&gt;the 29-second API Gateway integration timeout.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Over the past year, AWS has introduced several ways to address this problem. This article walks through those options, based on what actually works when you’re trying to keep GenAI APIs stable, scalable, and usable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Option 1: Increasing the API Gateway integration timeout
&lt;/h2&gt;

&lt;p&gt;In mid-2024, AWS finally allowed REST API integration timeouts to be increased beyond the long-standing 29-second limit. &lt;a href="https://aws.amazon.com/about-aws/whats-new/2024/06/amazon-api-gateway-integration-timeout-limit-29-seconds/" rel="noopener noreferrer"&gt;Amazon API Gateway integration timeout limit increase beyond 29 seconds&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This sounds like the obvious fix. It requires no code changes and keeps the synchronous request-response model intact. &lt;/p&gt;

&lt;h3&gt;
  
  
  The Trade-offs:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The "Spinner" Problem:&lt;/strong&gt; From a user experience perspective, you’re simply extending how long someone stares at a loading spinner.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Availability:&lt;/strong&gt; Works only for Regional REST APIs and private REST APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Throttling:&lt;/strong&gt; It might lead to a reduction in your account-level throttle quota limit&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Option 2: The Asynchronous "Job" Pattern (Polling)
&lt;/h2&gt;

&lt;p&gt;Sometimes, streaming isn't the right fit. If your GenAI application is generating images, creating PDFs, or running complex "Agentic" workflows that involve 2 minutes of silent "reasoning" before producing an answer, streaming text chunks provides no value.&lt;/p&gt;

&lt;p&gt;In this pattern, API Gateway acts as a dispatcher rather than a waiter.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Request: The client sends a POST request.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dispatch: API Gateway triggers an asynchronous process (via SQS or AWS Step Functions) and immediately returns a 202 Accepted response with a jobId.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Processing: The backend (Lambda/Bedrock) processes the request offline, unaffected by API Gateway timeouts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Retrieval: The client polls a status endpoint (GET /jobs/{jobId}) every few seconds to check if the work is complete.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Trade-offs:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Client Complexity: The frontend client must implement polling logic (e.g., "check status every 3 seconds").&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Latency: There is inherently a small delay between the job actually finishing and the client's next poll interval catching it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost: You pay for the initial request plus every subsequent polling request, which can add up if thousands of users are polling frequently.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Option 3: Use API Gateway WebSocket APIs
&lt;/h2&gt;

&lt;p&gt;API Gateway WebSocket APIs remove the request-response timeout entirely by switching to a persistent, stateful connection. However, there are following things to consider. &lt;/p&gt;

&lt;p&gt;WebSockets work best when the system is designed to communicate progress (e.g., "Thinking...", "Searching knowledge base..."), not just deliver a final answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Trade-offs:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Idle Timeout:&lt;/strong&gt; There is still a 10-minute idle timeout on WebSocket connections. If no data is sent during that window, the connection is closed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complexity:&lt;/strong&gt; They introduce additional complexity in connection management, retries, and client-side state handling.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Option 4: REST API response streaming
&lt;/h2&gt;

&lt;p&gt;AWS added response streaming for REST APIs in late 2025: &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/11/api-gateway-response-streaming-rest-apis/" rel="noopener noreferrer"&gt;Amazon API Gateway now supports response streaming for REST APIs&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;This allows Lambda to stream chunks of data back to the client as soon as they are available, rather than waiting for the entire response to be generated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this is the preferred pattern for LLMs:&lt;/strong&gt; Users see immediate feedback instead of a blank screen, even if the total execution time remains the same. This drastically improves the "Time to First Token" (TTFT) metric.&lt;/p&gt;

&lt;p&gt;There’s a great walkthrough in Serverless Office Hours:&lt;br&gt;


  &lt;iframe src="https://www.youtube.com/embed/OOyPRuIuA5w"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h3&gt;
  
  
  The Trade-offs:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Runtime Restrictions:&lt;/strong&gt; Native streaming support is currently strongest in Node.js managed runtimes. If you use Python, Java, or .NET, you often need to implement a custom runtime or use the AWS Lambda Web Adapter to proxy the stream.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lost API Gateway Features:&lt;/strong&gt; Because API Gateway no longer buffers the response, it cannot modify it. You lose support for Endpoint Caching, Content Encoding (automatic GZIP compression), and VTL transformations on the response body.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Error Handling Complexity:&lt;/strong&gt; Once your Lambda sends the first byte (usually a 200 OK header), you cannot change the status code. If the LLM hallucinates or crashes mid-stream, the API will still report "Success" HTTP status, so your client must be smart enough to parse error messages inside the text stream.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bandwidth Throttling:&lt;/strong&gt; For very large responses, the first 6-10 MB bursts at full speed, but remaining data is often throttled (e.g., 2 MB/s).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary: Which Pattern Should You Choose?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;th&gt;Best Use Case&lt;/th&gt;
&lt;th&gt;Key Constraint&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Timeout Increase&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Internal tools / MVPs&lt;/td&gt;
&lt;td&gt;Users stare at a loading spinner (High TTFT).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;The Asynchronous "Job" Pattern (Polling)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Image gen / Silent agents&lt;/td&gt;
&lt;td&gt;Polling cost &amp;amp; delayed completion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WebSockets&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Bi-directional Agents&lt;/td&gt;
&lt;td&gt;Requires managing connection state &amp;amp; heartbeats.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Response Streaming&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Chatbots &amp;amp; Text Gen&lt;/td&gt;
&lt;td&gt;Node.js preferred; No API Caching or VTL.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>serverless</category>
      <category>genai</category>
      <category>apigateway</category>
      <category>aws</category>
    </item>
    <item>
      <title>Using RDS Proxy with a .NET Lambda for Efficient Database Connections</title>
      <dc:creator>Sabarish Sathasivan</dc:creator>
      <pubDate>Sat, 04 Jan 2025 19:59:55 +0000</pubDate>
      <link>https://forem.com/thedeveloperjournal/using-rds-proxy-with-a-net-lambda-for-efficient-database-connections-2a65</link>
      <guid>https://forem.com/thedeveloperjournal/using-rds-proxy-with-a-net-lambda-for-efficient-database-connections-2a65</guid>
      <description>&lt;p&gt;Efficient database connection management is critical in serverless architectures like AWS Lambda.Each Lambda invocation opens a new database connection, and under high traffic, this can quickly exceed the database's connection limits, causing performance bottlenecks and resource exhaustion. &lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction to RDS Proxy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Amazon RDS Proxy&lt;/strong&gt;, a fully managed AWS service, optimizes database connections for RDS databases. Acting as an intermediary between your application and the database, it pools and reuses connections, significantly improving efficiency and scalability. RDS Proxy supports a wide range of database engines, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; MySQL&lt;/li&gt;
&lt;li&gt; PostgreSQL&lt;/li&gt;
&lt;li&gt;MariaDB&lt;/li&gt;
&lt;li&gt;Microsoft SQL Server&lt;/li&gt;
&lt;li&gt;Amazon Aurora (MySQL and PostgreSQL) &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more details including limitations, refer the following User Guide- &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-proxy.html" rel="noopener noreferrer"&gt;Amazon RDS Proxy&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up RDS Proxy
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Sign in to the AWS Management Console and navigate to &lt;a href="https://console.aws.amazon.com/rds/" rel="noopener noreferrer"&gt;Amazon RDS Console&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;In the navigation pane, choose &lt;strong&gt;Proxies&lt;/strong&gt; and click &lt;strong&gt;Create proxy&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Configure the following settings:&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Engine family&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Determines the database engine family - MySQL, PostgreSQL, MariaDB, Microsoft SQL Server and Amazon Aurora (MySQL and PostgreSQL).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Proxy identifier&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Name of the Proxy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Idle client connection timeout&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Specify how long a client connection can stay idle before the proxy closes it. By default, this is 30 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;List of RDS DB Instances that can be accessed through the proxy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Connection pool maximum connections&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Represents the percentage of the database's maximum concurrent connections that RDS Proxy can use.  If there is a single proxy with a RDS Instance, this value can be 100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IAM role&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Name of the IAM role the proxy will use to access the AWS Secrets Manager secrets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Secrets Manager secrets&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;List of Secrets that contains the database credentials for the proxy to access the database. A proxy can have up to 200 associated Secrets Manager secrets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Client authentication type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Defines the proxy's authentication method for client connections and associated Secrets Manager secrets.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IAM authentication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Specifies whether the proxy requires, disallows, or allows IAM authentication. Applicable on for RDS for Microsoft SQL Server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Require Transport Layer Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enable this setting to enforce TLS/SSL for all client connections&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For a detailed explanation of the parameters, refer to the article: &lt;br&gt;
&lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-proxy-creating.html" rel="noopener noreferrer"&gt;Creating an RDS Proxy&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click on the &lt;strong&gt;Create Proxy&lt;/strong&gt; button to complete the setup.&lt;/li&gt;
&lt;li&gt;Once created, the proxy appears under the Proxies section in the navigation pane.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z5o9jwyjmi1zzphlivp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z5o9jwyjmi1zzphlivp.png" alt="List of proxies" width="800" height="111"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Click on the proxy's name to view its details. &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foiskeghwsico5jmf6w1r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foiskeghwsico5jmf6w1r.png" alt="Proxy details" width="800" height="323"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use the &lt;strong&gt;Endpoint&lt;/strong&gt; listed in the &lt;strong&gt;Proxy Endpoints&lt;/strong&gt; section as the host in your database connection string instead of the database hostname to leverage RDS Proxy.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Creating a Database User and Adding It to the RDS Proxy
&lt;/h3&gt;

&lt;p&gt;To ensure the RDS Proxy can interact with your database, follow these steps to create a database user, store the credentials, and associate them with the proxy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a database user with the necessary privileges
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE USER 'newuser'@'%' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON database.* TO 'newuser'@'%';
FLUSH PRIVILEGES;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Store the database credentials in AWS Secrets Manager

&lt;ul&gt;
&lt;li&gt;Navigate to AWS Secrets Manager in the AWS Management Console.&lt;/li&gt;
&lt;li&gt;Create a new secret containing the database username and password.&lt;/li&gt;
&lt;li&gt;Note the ARN of the secret for use in Lambda.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fenaaqsu69paxr5y7jorp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fenaaqsu69paxr5y7jorp.png" alt="Secret Details" width="800" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add the secret to the RDS Proxy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftivor0bw9orpelnn3tek.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftivor0bw9orpelnn3tek.png" alt="Add Secret to RDS Proxy" width="800" height="113"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Attach the following policy to the IAM role associated with the RDS Proxy to read the secret:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue"
      ],
      "Resource": "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:&amp;lt;secret-arn&amp;gt;"
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Integrating RDS Proxy in the .NET Lambda
&lt;/h2&gt;

&lt;p&gt;There are multiple approaches to integrate RDS proxy into your .NET Lambda function to access an RDS/MySQL database &lt;/p&gt;

&lt;h3&gt;
  
  
  Database Credentials
&lt;/h3&gt;

&lt;p&gt;This approach lets your Lambda authenticate to the RDS Proxy using the RDS endpoint and credentials from a secret. Follow these steps to use Database Credentials: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; If the database credentials are stored in a secret, attach the following policy to the Lambda Execution Role to read the secret:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue"
      ],
      "Resource": "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:&amp;lt;secret-arn&amp;gt;"
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt; Update the lambda code to connect to the database using the RDS Proxy endpoint and database credentials
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;using MySql.Data.MySqlClient;

namespace RDSProxyTest
{
    internal class DBConnection
    {
        public async Task&amp;lt;MySqlConnection&amp;gt; OpenConnectionAsync()
        {
            var connectionString = new MySqlConnectionStringBuilder
            {
                Server = "&amp;lt;rds_proxy_endpoint&amp;gt;",
                UserID = "&amp;lt;database_user_name&amp;gt;",
                Password = "&amp;lt;database_password&amp;gt;",
                Database = "&amp;lt;database_name&amp;gt;"
            }.ToString();

            var connection = new MySqlConnection(connectionString);
            await connection.OpenAsync();
            return connection;
        }
    }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  IAM Authentication
&lt;/h3&gt;

&lt;p&gt;This approach enables your Lambda function to authenticate to the RDS Proxy using short-lived IAM tokens instead of database credentials. Follow these steps to enable and use IAM Authentication:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Enable IAM Authentication for your RDS proxy&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcdgmhmlpw4bewa0i7kkf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcdgmhmlpw4bewa0i7kkf.png" alt="Enable IAM Proxy Authentication" width="800" height="244"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Attach the following policy to the Lambda Execution Role for the Lambda to access the RDS Proxy:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue"
      ],
      "Resource": "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:&amp;lt;secret-arn&amp;gt;"
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;REGION&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Your AWS region (e.g., us-east-1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ACCOUNT_ID&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Account ID&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RDS_PROXY_IDENTIFIER&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The unique identifier for the RDS Proxy . For ex: if the arn of the RDS Proxy is &lt;strong&gt;arn:aws:rds:REGION:ACCOUNT_ID:db-proxy:prx-XXXXXXXXXXXXXXXXX&lt;/strong&gt;, then value of &lt;strong&gt;RDS_PROXY_IDENTIFIER&lt;/strong&gt; is &lt;strong&gt;prx-XXXXXXXXXXXXXXXXX&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DB_USERNAME&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A database username, configured in one of the secrets attached to the RDS Proxy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt; Update the lambda code to connect to the database using the RDS Proxy endpoint and database credentials
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;using Amazon;
using Amazon.RDS.Util;
using Amazon.Runtime;

using MySql.Data.MySqlClient;

namespace RDSProxyTest
{
    internal class DBConnection
    {
        private static async Task&amp;lt;string&amp;gt; GenerateAuthTokenAsync(string endpoint, string username, string region)
        {
            var regionEndpoint = RegionEndpoint.GetBySystemName(region);

            // Use default credentials of the Lambda environment
            var credentials = FallbackCredentialsFactory.GetCredentials();

            // Generate the authentication token
            return await RDSAuthTokenGenerator.GenerateAuthTokenAsync(
                credentials,
               regionEndpoint,
                endpoint,
               3306,
                username


            );
        }
        public async Task&amp;lt;MySqlConnection&amp;gt; OpenConnectionAsync()
        {
            var token = await GenerateAuthTokenAsync("&amp;lt;rds_proxy_endpoint&amp;gt;", "&amp;lt;database_user_name&amp;gt;", "&amp;lt;aws_region&amp;gt;");
            var connectionString = new MySqlConnectionStringBuilder
            {
                Server = "&amp;lt;rds_proxy_endpoint&amp;gt;",
                Database = "&amp;lt;database_name&amp;gt;",
                UserID = "&amp;lt;database_user_name&amp;gt;",
                Password = token,
                SslMode = MySqlSslMode.Required
            }.ToString();

            var connection = new MySqlConnection(connectionString);
            await connection.OpenAsync();
            return connection;
        }
    }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Database Credentials versus IAM Authetication
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Database Credentials&lt;/th&gt;
&lt;th&gt;IAM Authentication&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Authentication Method&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Uses database username and password&lt;/td&gt;
&lt;td&gt;Uses short-lived IAM tokens (15 minutes) generated by AWS for secure authentication.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database Support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Works with all databases supported by RDS Proxy.&lt;/td&gt;
&lt;td&gt;Supported for Amazon Aurora (MySQL /PostgreSQL), MySQL, PostgreSQL, MariaDB. It is not supported for Amazon RDS for Oracle, Amazon RDS for SQL Server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Performance Impact&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Minimal Overhead&lt;/td&gt;
&lt;td&gt;Slightly higher overhead due to the need to generate and validate IAM tokens.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>lambda</category>
      <category>aws</category>
      <category>dotnet</category>
      <category>rds</category>
    </item>
    <item>
      <title>Fine-Tune Your Serverless REST APIs with AWS Lambda Power Tuning</title>
      <dc:creator>Sabarish Sathasivan</dc:creator>
      <pubDate>Thu, 28 Nov 2024 13:30:55 +0000</pubDate>
      <link>https://forem.com/thedeveloperjournal/optimizing-rest-api-performance-tuning-aws-lambda-with-aws-lambda-power-tuning-1obp</link>
      <guid>https://forem.com/thedeveloperjournal/optimizing-rest-api-performance-tuning-aws-lambda-with-aws-lambda-power-tuning-1obp</guid>
      <description>&lt;p&gt;Developing serverless REST APIs with API Gateway and AWS Lambda is now a common practice. With Amazon extending the &lt;a href="https://aws.amazon.com/about-aws/whats-new/2024/06/amazon-api-gateway-integration-timeout-limit-29-seconds" rel="noopener noreferrer"&gt;API Gateway timeout beyond 29 seconds&lt;/a&gt;), serverless REST APIs can now handle complex workflows like long-running machine learning predictions and Generative AI tasks.&lt;/p&gt;

&lt;p&gt;In this article, we’ll explore how to leverage the &lt;a href="https://github.com/alexcasalboni/aws-lambda-power-tuning" rel="noopener noreferrer"&gt;AWS Lambda Power Tuning&lt;/a&gt; tool to optimize serverless &lt;strong&gt;REST APIs (API Gateway configured with proxy integration + AWS Lambda)&lt;/strong&gt; for both performance and cost efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firtqa4s42zfpvp5prs9l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firtqa4s42zfpvp5prs9l.png" alt="Sample Architecture - API Gateway + AWS Lambda + RDS" width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Lambda Power Tuning
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/alexcasalboni/aws-lambda-power-tuning" rel="noopener noreferrer"&gt;AWS Lambda Power Tuning&lt;/a&gt; tool is an AWS Step Functions-based state machine designed to test Lambda performance under various memory configurations. It helps optimize for cost or performance (or a balance of both) and is compatible with any Lambda runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;To deploy the AWS Lambda Power Tuning tool, follow the instructions in the deployment &lt;a href="https://github.com/alexcasalboni/aws-lambda-power-tuning/blob/master/README-DEPLOY.md" rel="noopener noreferrer"&gt;guide&lt;/a&gt;. Once deployed, the state machine will appear in AWS Step Functions, as shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpif70obu3mocu2w35z63.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpif70obu3mocu2w35z63.png" alt="AWS Lambda Power Tuning Tool State machine in AWS Console" width="800" height="136"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Executing the Tool
&lt;/h3&gt;

&lt;p&gt;When executing the state machine, you can customize several parameters. Below is a summary:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;lambdaARN&lt;/td&gt;
&lt;td&gt;Required. ARN of the Lambda function to optimize.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;num&lt;/td&gt;
&lt;td&gt;Required. Number of invocations per power configuration (min: 5, recommended: 10–100).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;powerValues&lt;/td&gt;
&lt;td&gt;Optional. Memory values to test (128MB–10,240MB). Defaults to values set at deployment.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;payload&lt;/td&gt;
&lt;td&gt;Optional. Request payload for the API. Can support a static payload for every invocation or a payload from a list with relative weights&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;payloadS3&lt;/td&gt;
&lt;td&gt;S3 object location for payloads &amp;gt;256KB.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;parallelInvocation&lt;/td&gt;
&lt;td&gt;Runs all invocations in parallel if set to true (default: false).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;strategy&lt;/td&gt;
&lt;td&gt;It can be "cost" or "speed" or "balanced"; if you use "cost" the tool will suggest the cheapest option, while if you use "speed" the state machine will suggest the fastest option. When using "balanced" the state machine will choose a compromise between "cost" and "speed" according to the parameter "balancedWeight".&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;balancedWeight&lt;/td&gt;
&lt;td&gt;Parameter that represents the trade-off between cost and speed. Value is between 0 and 1, where 0.0 is equivalent to "speed" strategy, 1.0 is equivalent to "cost" strategy. Default :0.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;preProcessorARN&lt;/td&gt;
&lt;td&gt;ARN of a Lambda function to run before each invocation of the target function.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;postProcessorARN&lt;/td&gt;
&lt;td&gt;ARN of a Lambda function to run after each invocation of the target function.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;includeOutputResults&lt;/td&gt;
&lt;td&gt;Includes average cost and duration for each configuration in the final output (default: false).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;onlyColdStarts&lt;/td&gt;
&lt;td&gt;Forces all invocations to be cold starts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Refer to the official &lt;a href="https://github.com/alexcasalboni/aws-lambda-power-tuning/blob/master/README-EXECUTE.md#state-machine-input-at-execution-time" rel="noopener noreferrer"&gt;official documentation.&lt;/a&gt; for detailed explanations.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example Input
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "lambdaARN": "&amp;lt;arn of the function being executed&amp;gt;",
  "powerValues": [ 128, 256, 512, 1024, 1536, 2048, 2560, 3072],
  "num": 10,
  "strategy": "speed",
  "payload": {...},
  "parallelInvocation": true,
  "includeOutputResults": true,
  "onlyColdStarts": true
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Input Payloads for Proxy Integration
&lt;/h4&gt;

&lt;p&gt;Inputs to test Lambda functions behind API Gateway can vary by HTTP method. Below are sample payload links for common methods:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;HTTP Method&lt;/th&gt;
&lt;th&gt;GitHub URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;POST&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ssathasivan/AWSLambaPowerTuningToolPayloads/blob/main/Post-singleInput.json" rel="noopener noreferrer"&gt;Click for sample input&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PUT&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ssathasivan/AWSLambaPowerTuningToolPayloads/blob/main/Put.json" rel="noopener noreferrer"&gt;Click for sample input&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ssathasivan/AWSLambaPowerTuningToolPayloads/blob/main/Get.json" rel="noopener noreferrer"&gt;Click for sample input&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GET (With Path Parameters)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ssathasivan/AWSLambaPowerTuningToolPayloads/blob/main/GetByPathParameter.json" rel="noopener noreferrer"&gt;Click for sample input&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GET (With QueryString)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ssathasivan/AWSLambaPowerTuningToolPayloads/blob/main/GetByQueryStringParameter.json" rel="noopener noreferrer"&gt;Click for sample input&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DELETE&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ssathasivan/AWSLambaPowerTuningToolPayloads/blob/main/Delete.json" rel="noopener noreferrer"&gt;Click for sample input&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PATCH&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ssathasivan/AWSLambaPowerTuningToolPayloads/blob/main/Patch.json" rel="noopener noreferrer"&gt;Click for sample input&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Weighted Payloads
&lt;/h4&gt;

&lt;p&gt;The tool also offers the option to define multiple payloads for HTTP methods, making it suitable for scenarios where payload structures vary significantly and can impact performance or speed. Refer &lt;a href="https://github.com/alexcasalboni/aws-lambda-power-tuning/blob/master/README-ADVANCED.md#weighted-payloads" rel="noopener noreferrer"&gt;Weighted Payloads&lt;/a&gt; in official documentation to understand how weighted payloads work&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;HTTP Method&lt;/th&gt;
&lt;th&gt;GitHub URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;POST (With Weighted Payloads)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ssathasivan/AWSLambaPowerTuningToolPayloads/blob/main/Post-weightedinput.json" rel="noopener noreferrer"&gt;Click for sample input&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Pre/post-processing functions
&lt;/h4&gt;

&lt;p&gt;The tool also provides the ability to run custom logic before and after the execution of the lambda function. This logic should be implemented as lambda functions. Refer &lt;a href="https://github.com/alexcasalboni/aws-lambda-power-tuning/blob/master/README-ADVANCED.md#prepost-processing-functions" rel="noopener noreferrer"&gt;Pre/Post-processing functions&lt;/a&gt; in official documentation to understand how weighted payloads work&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;HTTP Method&lt;/th&gt;
&lt;th&gt;GitHub URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Post (With Pre/Post functions)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ssathasivan/AWSLambaPowerTuningToolPayloads/blob/main/Post-post-pre-functions.json" rel="noopener noreferrer"&gt;Click for sample input&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Output
&lt;/h3&gt;

&lt;p&gt;A sample execution output is shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "output": {
    "power": 2048,
    "cost": 0.0000018816000000000001,
    "duration": 54.95933333333334,
    "stateMachine": {
      "executionCost": 0.00075,
      "lambdaCost": 0.0013002423000000002,
      "visualization": "https://lambda-power-tuning.show/#encodeddata"
    },
    "stats": [
      {"value": 128, "averagePrice": 9.345e-7, "averageDuration": 443.8995},
      {"value": 2048, "averagePrice": 0.0000018816, "averageDuration": 54.9593}
    ]
  }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A brief description of the output is given below&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Key&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Description&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;output.power&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The optimal memory configuration (RAM).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;output.cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The corresponding average cost (per invocation).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;output.duration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The corresponding average duration (per invocation).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;output.stateMachine.executionCost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The AWS Step Functions cost corresponding to this state machine execution (fixed value for "worst" case).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;output.stateMachine.lambdaCost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The AWS Lambda cost corresponding to this state machine execution (depending on number of executions and average execution time).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;output.stateMachine.visualization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A URL to visualize and inspect average statistics about cost and performance. Note: Average statistics are NOT shared with the server, as all data is encoded in the URL hash, client-side only.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;output.stats&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The average duration and cost for every tested power value configuration. Only included if &lt;code&gt;includeOutputResults&lt;/code&gt; is set to a truthy value.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Visualizing the output
&lt;/h4&gt;

&lt;p&gt;The element - &lt;strong&gt;output.stateMachine.visualization&lt;/strong&gt; provides a visualization URL - &lt;a href="https://lambda-power-tuning.show/#encodeddata" rel="noopener noreferrer"&gt;https://lambda-power-tuning.show/#encodeddata&lt;/a&gt; that can be used to visualize the result. &lt;/p&gt;

&lt;p&gt;The source code of the UI is also open source - &lt;a href="https://github.com/matteo-ronchetti/aws-lambda-power-tuning-ui" rel="noopener noreferrer"&gt;https://github.com/matteo-ronchetti/aws-lambda-power-tuning-ui&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The outputs of POST / GET (With Path Parameters) are shown below&lt;/p&gt;

&lt;h5&gt;
  
  
  POST
&lt;/h5&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkdo3oibiti3cjg8n75cr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkdo3oibiti3cjg8n75cr.png" alt="Results for POST" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  GET (With Path Parameters)
&lt;/h5&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmg8cnnq2oneexf8860i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmg8cnnq2oneexf8860i.png" alt="Results for GET (With Path Parameters) " width="800" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The tool includes a feature to compare the results of function invocations. To see how this functionality is applied in practice, check out the article &lt;a href="https://dev.to/techtrailwithsab/secrets-management-in-net-lambda-secretsmanager-sdk-vs-aws-parameters-and-secrets-lambda-extension-40pk"&gt;Secrets Management in .NET Lambda&lt;/a&gt;, where we demonstrate its use to compare the performance of reading secrets in a .NET Lambda.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2024/06/amazon-api-gateway-integration-timeout-limit-29-seconds/" rel="noopener noreferrer"&gt;Amazon API Gateway integration timeout limit increase beyond 29 seconds &lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/alexcasalboni/aws-lambda-power-tuning" rel="noopener noreferrer"&gt;AWS Lambda Power Tuning&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/alexcasalboni/aws-lambda-power-tuning/blob/master/README-DEPLOY.md" rel="noopener noreferrer"&gt;AWS Lambda Power Tuning - Deployment Guide&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/alexcasalboni/aws-lambda-power-tuning/blob/master/README-EXECUTE.md#state-machine-input-at-execution-time" rel="noopener noreferrer"&gt;AWS Lambda Power Tuning - Execution Parameters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-proxy-integrations.html" rel="noopener noreferrer"&gt;Input format for API Gateway Proxy Integrations
&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>lambda</category>
      <category>aws</category>
      <category>apigateway</category>
      <category>performance</category>
    </item>
    <item>
      <title>Secrets Management in .NET Lambda</title>
      <dc:creator>Sabarish Sathasivan</dc:creator>
      <pubDate>Tue, 12 Nov 2024 22:06:55 +0000</pubDate>
      <link>https://forem.com/thedeveloperjournal/secrets-management-in-net-lambda-secretsmanager-sdk-vs-aws-parameters-and-secrets-lambda-extension-40pk</link>
      <guid>https://forem.com/thedeveloperjournal/secrets-management-in-net-lambda-secretsmanager-sdk-vs-aws-parameters-and-secrets-lambda-extension-40pk</guid>
      <description>&lt;p&gt;Typically, an application must deal with sensitive information like API keys, database credentials, etc. A recommended approach is to store this sensitive information as secrets using the Secrets Manager Service in AWS. This article will explore multiple approaches to retrieving and caching secrets in a. NET-based Lambda.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS SDK Based Approach
&lt;/h2&gt;

&lt;p&gt;AWS provides the following package - &lt;strong&gt;AWSSDK.SecretsManager.Caching&lt;/strong&gt; that can be used to retrieve and then cache the secret for future use.&lt;/p&gt;

&lt;p&gt;To use the caching library, add the package &lt;strong&gt;AWSSDK.SecretsManager.Caching&lt;/strong&gt; to your .NET Lambda&lt;/p&gt;

&lt;p&gt;&lt;code&gt;dotnet add package AWSSDK.SecretsManager.Caching&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The following code snippet shows a method that retrieves a secret and cache it for 15 minutes&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;using Amazon.SecretsManager.Extensions.Caching;

namespace Lambda.Secrets.AWSSDK
{
    public class SecretsProvider : ISecretsProvider
    {

        //Set the cache item ttl to 15 mins
        private static SecretCacheConfiguration _cacheConfiguration = new SecretCacheConfiguration
        {
            CacheItemTTL = 900000
        };

        private SecretsManagerCache _cache = new SecretsManagerCache(_cacheConfiguration);


        public async Task&amp;lt;string&amp;gt; GetSecretAsync(string secretName)
        {
            return await _cache.GetSecretString(secretName);
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To reuse a secret across the lifetime of a Lambda execution environment (i.e., while the container is warm), inject the secret-retrieving class as a singleton. &lt;/p&gt;

&lt;p&gt;Using AddSingleton ensures the class is instantiated only once per warm environment, allowing the secret to persist across multiple invocations without re-fetching. This approach reduces overhead, improves performance, and minimizes calls to AWS Secrets Manager.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Parameters and Secrets Lambda Extension
&lt;/h2&gt;

&lt;p&gt;AWS provides an extension - &lt;a href="https://aws.amazon.com/about-aws/whats-new/2022/10/aws-parameters-secrets-lambda-extension/#:~:text=Today%2C%20AWS%20launched%20the%20AWS,secrets%20from%20AWS%20Secrets%20Manager." rel="noopener noreferrer"&gt;AWS Parameters and Secrets Lambda Extension&lt;/a&gt;.This extension can be installed as a Lambda Layer and acts an in-memory cache for parameters and secrets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda Layers
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/chapter-layers.html" rel="noopener noreferrer"&gt;Lambda layers&lt;/a&gt; provide a convenient way to manage reusable code and dependencies across multiple Lambda functions. A layer is essentially a ZIP archive that can include libraries, custom runtimes, or other necessary files. By using layers, you can avoid bundling these resources directly in your function's deployment package, reducing its size and improving maintainability. &lt;/p&gt;

&lt;h3&gt;
  
  
  How the Extension Works
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The extension exposes a http endpoint (&lt;a href="http://localhost:2773" rel="noopener noreferrer"&gt;http://localhost:2773&lt;/a&gt;) to the lambda&lt;/li&gt;
&lt;li&gt;When a secret is requested

&lt;ul&gt;
&lt;li&gt;The extension first checks the cache for an existing entry. &lt;/li&gt;
&lt;li&gt;If the entry is not found, the value is retrieved from the Secrets Manager, cached with a TTL (default 300 seconds) and value is returned&lt;/li&gt;
&lt;li&gt;If an entry is found, it verifies the time elapsed since the entry was added to the cache. If the elapsed time is within the configured cache TTL (time-to-live), the entry is returned from the cache, otherwise fresh data   is fetched from the Secrets Manager, cached and returned.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Refer the article - &lt;a href="https://aws.amazon.com/blogs/compute/using-the-aws-parameter-and-secrets-lambda-extension-to-cache-parameters-and-secrets/" rel="noopener noreferrer"&gt;Using the AWS Parameter and Secrets Lambda extension to cache parameters and secrets&lt;/a&gt; to understand the architecture of the extension.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add the Layer to Your Lambda Function
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; Open the AWS Lambda Console and navigate to your function.&lt;/li&gt;
&lt;li&gt;Under the Layers section, click Add a layer.

&lt;ul&gt;
&lt;li&gt;Select the option AWS layers &lt;/li&gt;
&lt;li&gt;Select "AWS-Parameters-and-Secrets-Lambda-Extension" and the latest version&lt;/li&gt;
&lt;li&gt;Click Add. 
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj74f4syvmxz1e4m862xo.png" alt="Select the latest version of the layer" width="800" height="648"&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;The layer will be added to your Lambda&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff4vh52bi6gm1wv67asdw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff4vh52bi6gm1wv67asdw.png" alt="Layer is added to the lambda" width="800" height="72"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuring the Extension
&lt;/h3&gt;

&lt;p&gt;The extension can be configured using environment variables. Some important configurations are listed below&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;SECRETS_MANAGER_TTL: TTL of a secret in the cache in seconds. Must be a value e between 0 and 300. The default is 300&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PARAMETERS_SECRETS_EXTENSION_CACHE_SIZE: The maximum number of secrets and parameters to cache. Must be a value from 0 to 1000. Default is 1000&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Refer the section - &lt;strong&gt;AWS Parameters and Secrets Lambda Extension environment variables&lt;/strong&gt; in the following &lt;a href="https://docs.aws.amazon.com/secretsmanager/latest/userguide/retrieving-secrets_lambda.html#:~:text=SECRETS_MANAGER_TTL%20TTL%20of%20a%20secret%20in%20the%20cache,if%20PARAMETERS_SECRETS_EXTENSION_CACHE_SIZE%20is%200.%20Default%20is%20300%20seconds." rel="noopener noreferrer"&gt;article&lt;/a&gt; to get the complete list of environment variables.&lt;/p&gt;

&lt;p&gt;The following code snippet shows a method that retrieves a secret using the extension&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;namespace Lambda.Secrets.Extension
{
    public class SecretsProvider : ISecretsProvider
    {
        private readonly HttpClient _httpClient;

        private readonly string GetSecretsEndpoint = "/secretsmanager/get?secretId=";

        public SecretsProvider(HttpClient httpClient)
        {
            _httpClient = httpClient;
        }

        public async Task&amp;lt;string&amp;gt; GetSecretAsync(string secretName)
        {
            var httpRequest = new HttpRequestMessage(
                HttpMethod.Get,
                new Uri($"{GetSecretsEndpoint}{HttpUtility.UrlEncode(secretName)}", UriKind.Relative));

            //Pass X-Aws-Parameters-Secrets-Token as a header. This is a required header that uses the AWS_SESSION_TOKEN value,
            //which is present in the Lambda execution environment by default. 
            httpRequest.Headers.Add("X-Aws-Parameters-Secrets-Token",
                Environment.GetEnvironmentVariable("AWS_SESSION_TOKEN")
            );
            var response = await _httpClient
                .SendAsync(httpRequest)
                .ConfigureAwait(false);

            response.EnsureSuccessStatusCode();
            var responseAsJson = await response.Content.ReadFromJsonAsync&amp;lt;GetSecretValueResponse&amp;gt;();
            return responseAsJson!.SecretString;
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The class can be injected into the Lambda as given below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;builder.Services.AddAWSLambdaHosting(LambdaEventSource.RestApi)
           .AddHttpClient&amp;lt;ISecretsProvider, SecretsProvider&amp;gt;(c =&amp;gt;
        {
            c.BaseAddress = new Uri("http://localhost:2773");
        });
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Testing Approach
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/alexcasalboni/aws-lambda-power-tuning" rel="noopener noreferrer"&gt;AWS Lambda Power Tuning&lt;/a&gt; was used to test the performance of the Lambda functions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Lambda functions were with the following memory configurations: 128 MB, 256 MB, 512 MB, 1024 MB, 2048 MB, 2560 MB,3072 MB. &lt;/li&gt;
&lt;li&gt; Each configuration was invoked 100 times.&lt;/li&gt;
&lt;li&gt; Lambda invocations were done in parallel and had a combination of cold starts and warm starts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS Lambda Power Tuning measured execution time and calculated the associated costs, providing insights into performance improvements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test Results
&lt;/h3&gt;

&lt;p&gt;The test indicates that lambda that retrieves the secret using Lambda Extension is always more performant and cheaper than the lambda that retrieves the secret using AWS SDK. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqxgw184f2vinuj48w4s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqxgw184f2vinuj48w4s.png" alt="Test Results" width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Execution Time
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Memory Allocation&lt;/th&gt;
&lt;th&gt;AWS Secrets SDK (ms)&lt;/th&gt;
&lt;th&gt;Secrets Manager Extension (ms)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;128 MB&lt;/td&gt;
&lt;td&gt;10738&lt;/td&gt;
&lt;td&gt;6171&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;256 MB&lt;/td&gt;
&lt;td&gt;5155&lt;/td&gt;
&lt;td&gt;3189&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;512 MB&lt;/td&gt;
&lt;td&gt;1762&lt;/td&gt;
&lt;td&gt;1391&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1024 MB&lt;/td&gt;
&lt;td&gt;934&lt;/td&gt;
&lt;td&gt;716&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1536 MB&lt;/td&gt;
&lt;td&gt;657&lt;/td&gt;
&lt;td&gt;191&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2048 MB&lt;/td&gt;
&lt;td&gt;398&lt;/td&gt;
&lt;td&gt;285&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2560 MB&lt;/td&gt;
&lt;td&gt;370&lt;/td&gt;
&lt;td&gt;310&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3072 MB&lt;/td&gt;
&lt;td&gt;446&lt;/td&gt;
&lt;td&gt;115&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Cost
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Memory Allocation&lt;/th&gt;
&lt;th&gt;AWS Secrets SDK ($)&lt;/th&gt;
&lt;th&gt;Secrets Manager Extension  ($)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;128 MB&lt;/td&gt;
&lt;td&gt;0.00002127&lt;/td&gt;
&lt;td&gt;0.00001172&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;256 MB&lt;/td&gt;
&lt;td&gt;0.00001919&lt;/td&gt;
&lt;td&gt;0.00001089&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;512 MB&lt;/td&gt;
&lt;td&gt;0.00001136&lt;/td&gt;
&lt;td&gt;0.00000797&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1024 MB&lt;/td&gt;
&lt;td&gt;0.00000955&lt;/td&gt;
&lt;td&gt;0.00000609&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1536 MB&lt;/td&gt;
&lt;td&gt;0.00000856&lt;/td&gt;
&lt;td&gt;0.00000204&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2048 MB&lt;/td&gt;
&lt;td&gt;0.00000658&lt;/td&gt;
&lt;td&gt;0.00000369&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2560 MB&lt;/td&gt;
&lt;td&gt;0.00000764&lt;/td&gt;
&lt;td&gt;0.00000512&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3072 MB&lt;/td&gt;
&lt;td&gt;0.00001083&lt;/td&gt;
&lt;td&gt;0.00000246&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The source code for the Lambdas is shared at &lt;a href="https://github.com/ssathasivan/SecretManagementInDotnetLambda" rel="noopener noreferrer"&gt;SecretManagementInDotnetLambda &lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>lambda</category>
      <category>dotnet</category>
      <category>serverless</category>
    </item>
  </channel>
</rss>
