Forem: Sabarish Sathasivan

AWS Vector Databases – Part 3 : Choosing the Right Vector Database on AWS

Sabarish Sathasivan — Mon, 30 Mar 2026 18:29:12 +0000

This is where everything comes together.

👉 In case you missed it:

Part 1 → Embeddings, Dimensions, and Similarity Search
Part 2 → Search Patterns, Filtering, and Chunking

By now, you understand the fundamentals and how retrieval works. The real question is: which AWS service should you actually use?

There's no single "best" option. Each service was designed for a different primary workload and inherited vector search as a capability. That origin story shapes everything — its strengths, limitations, and the cost you'll pay.

Let's break them down.

The Services

OpenSearch Serverless

What it is: A distributed search engine with native vector, keyword, and hybrid search.

Why choose it: OpenSearch is the only AWS service that handles full-text search and vector search natively in one engine. Its Neural Search feature automates the entire hybrid pipeline — you send a query, and it runs keyword + semantic search, then merges results using normalization and combination techniques. No manual score merging required.

Key details:

Supports HNSW engine
Distance metrics: Euclidean, Cosine, Inner Product (Dot Product)
Hybrid search with Neural Search pipeline (keyword + vector, merged automatically)
GPU-accelerated vector indexing (launched Dec 2025) for faster large-scale ingestion
Bedrock Knowledge Bases: Yes (most commonly used vector store for Bedrock KB)

The catch — cost:
OpenSearch Serverless bills by OCU-hours (OpenSearch Compute Units). The minimum is 2 OCUs for production (1 indexing + 1 search, each with HA redundancy) — roughly $350/month before you store a single vector. A dev/test mode drops this to ~$174/month with 0.5 OCUs each. Vector collections also require their own dedicated OCUs — they can't share with search/time-series collections.

For small projects or prototypes, this minimum cost is the biggest friction point. But at scale, the automatic scaling and mature feature set make it the go-to for production RAG.

Scaling: Automatic. OCUs scale up based on workload and scale back down. You set a maximum OCU limit to cap costs.

Aurora PostgreSQL (pgvector)

What it is: A relational database with the open-source pgvector extension for vector search.

Why choose it: If your application already runs on PostgreSQL, pgvector lets you add vector search without introducing a new service. Your vectors live alongside your relational data — same transactions, same SQL, same backups. This is powerful when your queries combine traditional filters (WHERE category = 'shoes' AND price < 100) with vector similarity.

Key details:

pgvector 0.8.0 (April 2025) brought major improvements: up to 9x faster filtered queries with iterative index scans, and significantly better recall on filtered searches
Supports HNSW and IVFFlat indexing
Distance metrics: Euclidean, Cosine, Inner Product
Hybrid search: Manual — combine tsvector (keyword) and pgvector (semantic) in a single SQL query
Bedrock Knowledge Bases: Yes

The catch — you own the tuning:
pgvector gives you control, but that means you're responsible for index parameter tuning (ef_construction, m, ef_search), choosing between relaxed_order and strict_order for iterative scans, and managing the trade-off between recall and latency. It's not "plug and play" like OpenSearch Neural Search.

Scaling: Aurora Serverless v2 scales compute in fine-grained ACU increments. Read replicas handle query scale-out. I/O-Optimized configuration helps with cost predictability for vector workloads.

Amazon S3 Vectors

What it is: The first cloud object store with native vector support. Purpose-built for storing and querying vectors at massive scale and minimal cost.

Why choose it: When cost is the primary concern and you don't need millisecond latencies. S3 Vectors can reduce the total cost of storing and querying vectors by up to 90% compared to traditional vector databases.

Key details:

Up to 2 billion vectors per index, up to 10,000 indexes per vector bucket
Distance metrics: Cosine and Euclidean only (Inner Product not supported)
Metadata filtering applied during the vector search itself (not purely pre- or post-filter)
Fully serverless — no infrastructure to provision or manage
Does not support hybrid search (semantic search only)
Bedrock Knowledge Bases: Yes

The catch — latency and throughput:
S3 Vectors is designed for infrequent-to-moderate query patterns. Infrequent queries return in under 1 second; more frequent queries get down to ~100ms. Write throughput caps at ~2,500 vectors/second per index, and query throughput is in the hundreds of requests/second per index. This is not the right choice for real-time, high-QPS applications.

Cost example:
For 250K vectors across 40 indexes with 1M queries/month: approximately $11/month. Compare that to OpenSearch Serverless's $350/month minimum.

Scaling: Fully elastic. No capacity planning required. Costs scale linearly with storage and queries.

Amazon MemoryDB

What it is: A Redis-compatible, durable, in-memory database with vector search.

Why choose it: When you need single-digit millisecond vector search latency with strong durability. MemoryDB keeps both the vectors and the HNSW index in memory, which is why it's the fastest vector search option on AWS — supporting tens of thousands of queries/second at >99% recall.

Key details:

Supports FLAT (exact KNN) and HNSW indexing
Distance metrics: Euclidean, Cosine, Inner Product
Single-digit millisecond query and update latency
Multi-AZ durability (unlike typical in-memory caches)
Uses FT.SEARCH and FT.AGGREGATE commands for vector queries
Bedrock Knowledge Bases: No

The catch — single shard and RAM cost:
Vector search is limited to a single shard — no horizontal scaling for vectors. You can scale vertically (bigger instances) and add read replicas, but your total vector dataset must fit in the memory of one node. For a 10M vector dataset with 1024 dimensions, you might need a db.r7g.4xlarge (~105 GB usable memory). RAM is expensive.

Best for: Real-time RAG where freshness matters (index updates propagate in milliseconds), fraud detection, and real-time recommendation engines where every millisecond counts.

Scaling: Vertical (bigger nodes) + read replicas for query throughput. No horizontal shard scaling for vector workloads.

Amazon ElastiCache (Valkey)

What it is: A managed Valkey (open-source Redis fork) service with vector search, optimized for caching and ephemeral workloads.

Why choose it: Valkey is purpose-built for semantic caching and agent memory. If you're building agentic AI systems and need to cache LLM responses, store conversational memory, or implement fast vector lookups in the hot path of every request — this is the service.

Key details:

Supports HNSW and FLAT indexing
Distance metrics: Euclidean, Cosine, Inner Product
Microsecond-level latency for cached data
Integrates with LangGraph and mem0 for agent memory layers
Compatible with Amazon Bedrock AgentCore Runtime
Horizontal scaling supported — adding shards gives linear improvement in ingestion and recall
Bedrock Knowledge Bases: No

How it differs from MemoryDB:

ElastiCache Valkey supports multi-shard horizontal scaling for vectors (MemoryDB is single-shard only)
MemoryDB provides Multi-AZ durability (writes acknowledged only after replication); Valkey is designed more as a cache layer — it's durable but not to the same degree
Valkey includes mature cache primitives (TTLs, eviction policies, atomic operations) that make it natural for caching use cases

Best for: Semantic caching to reduce LLM costs, short-term and long-term agent memory via mem0/LangGraph, and any use case where vectors are in the hot path of a latency-sensitive request.

Scaling: Vertical, horizontal (multi-shard), and replica-based. Most flexible scaling model among the in-memory options.

Amazon Neptune Analytics

What it is: A graph analytics engine that also supports vector search, designed to combine graph traversals with vector similarity.

Why choose it: When your data has explicit relationships and you want to combine graph-based reasoning with semantic search. Neptune Analytics powers GraphRAG in Bedrock Knowledge Bases — it automatically extracts entities, facts, and relationships from your documents and stores them as a graph, then combines vector search with graph traversal for more comprehensive, cross-document answers.

Key details:

Stores embeddings directly on graph nodes
Combines vector similarity search with graph algorithms (PageRank, shortest path, etc.)
Supports openCypher query language
GraphRAG integration with Bedrock Knowledge Bases — auto-builds knowledge graphs from your documents
Bedrock Knowledge Bases: Yes (GraphRAG)

The catch:

Pricing is based on memory-optimized compute units (m-NCU), billed per hour
Autoscaling is not supported — you choose your graph capacity upfront
You can pause graphs when not in use (pay 10% of compute cost while paused)
Best suited for analytical / batch workloads rather than high-QPS online serving

Best for: Knowledge graphs for compliance/regulatory data, entity-relationship analysis combined with semantic search, and use cases where understanding connections between documents matters more than raw search speed.

Scaling: Provisioned (memory-optimized). Choose capacity at creation. No autoscaling.

Amazon DocumentDB

What it is: A MongoDB-compatible document database with vector search support.

Why choose it: If you're already on DocumentDB (or have a MongoDB-based application) and want to add vector search without a new service. Similar logic to Aurora pgvector — keep vectors alongside your document data.

Key details:

Available on DocumentDB 5.0+ instance-based clusters
Supports HNSW and IVFFlat indexing
Distance metrics: Euclidean, Cosine, Dot Product
Up to 2,000 dimensions with an index (16,000 without index)
Bedrock Knowledge Bases: No (not a supported Bedrock KB vector store)

The catch:

Instance-based scaling (no serverless option for vector workloads)
I/O costs can add up significantly
Smaller vector ecosystem and less community tooling compared to pgvector

Best for: Existing DocumentDB/MongoDB workloads that need to add semantic search alongside existing JSON document queries.

Scaling: Instance-based. Vertical scaling + read replicas.

Amazon Kendra (GenAI Index)

What it is: A fully managed enterprise search service. Not a vector database — it's an end-to-end search solution.

Why choose it: When you need enterprise search with built-in connectors to 43+ data sources (SharePoint, Salesforce, Google Drive, Confluence, etc.) and don't want to build a RAG pipeline from scratch. The GenAI Index uses hybrid search (vector + keyword) with pre-optimized parameters — no tuning required.

Key details:

43+ built-in data source connectors with metadata and permission filtering
Hybrid index combining vector and keyword search, pre-optimized
Integrates with both Bedrock Knowledge Bases and Amazon Q Business
A single Kendra GenAI Index can serve multiple Q Business apps and Bedrock KBs
Bedrock Knowledge Bases: Yes

The catch — pricing:
Kendra is expensive for what it offers. The GenAI Enterprise Edition base index starts at $0.32/hour (~$230/month) for up to 20,000 documents. The Basic Enterprise Edition is $1.40/hour (~$1,008/month). This is enterprise pricing — it makes sense when you're connecting to many data sources and need managed connectors, permissions, and relevance tuning out of the box.

Best for: Enterprise search across dozens of data sources where you need managed connectors, user-level access control, and a fully managed experience. Not for custom RAG pipelines where you want control over chunking, embeddings, and retrieval logic.

Scaling: Fully managed. Add storage units and query units as needed.

Decision Matrix

Scenario	Recommended	Why	Alternative
General-purpose RAG	OpenSearch Serverless	Native hybrid search, most mature Bedrock KB integration	Aurora pgvector
Already on PostgreSQL	Aurora pgvector	Add vectors without a new service, SQL + vector in one query	OpenSearch Serverless
Cost-sensitive / massive scale	S3 Vectors	90% cheaper, 2B vectors/index, fully serverless	OpenSearch Serverless
Ultra-low latency (real-time)	MemoryDB	Single-digit ms queries, >99% recall, durable	ElastiCache Valkey
Semantic caching / reduce LLM cost	ElastiCache Valkey	Cache primitives + vector search, microsecond latency	MemoryDB
Agent memory (short-term + long-term)	ElastiCache Valkey	LangGraph/mem0 integration, horizontal scaling	MemoryDB
Knowledge graph + vectors	Neptune Analytics	GraphRAG, entity-relationship reasoning	OpenSearch Serverless
Enterprise search (managed)	Kendra GenAI Index	43+ connectors, permissions, zero tuning	Bedrock KB + S3 Vectors
Already on MongoDB/DocumentDB	DocumentDB	Add vectors alongside existing JSON data	Aurora pgvector

Cost Comparison

Service	Pricing Model	Minimum Monthly Cost	Best Cost Scenario
S3 Vectors	Storage + PUT + queries	~$11 (250K vectors, 1M queries)	Cheapest at any scale
Aurora pgvector	Instance hours + storage + I/O	~$50+ (Serverless v2 min)	Cheap if DB already exists
OpenSearch Serverless	OCU-hours + S3 storage	~$174 (dev/test), ~$350 (prod)	Good at scale, painful for prototypes
DocumentDB	Instance hours + I/O	~$100+	Reasonable if already on DocumentDB
MemoryDB	Node hours (in-memory)	~$200+ (r7g.large)	Expensive — RAM is the bottleneck
ElastiCache Valkey	Node hours (in-memory)	~$100+ (r7g.large)	Similar to MemoryDB, scales better
Neptune Analytics	m-NCU hours	Varies by graph size	Pausable (10% cost when idle)
Kendra GenAI Index	Hourly base + units	~$230 (GenAI), ~$1,008 (Enterprise)	Enterprise pricing

When NOT to Use a Vector Database

Before building anything, ask yourself:

Small dataset (<10K items)? → Use in-memory search (FAISS, numpy) with no infrastructure
Exact match queries only? → Use a traditional database or search index
Structured filtering only? → A regular database with indexes will be faster and cheaper
Static FAQ or lookup table? → Don't overcomplicate it. A simple key-value store works
Real-time transactional workload? → Use a relational or NoSQL database; vector search is a read-optimized pattern
Near-zero budget? → Use FAISS locally or with S3 for persistence

That's a wrap on the series. If you're building with RAG or semantic search on AWS, you now have both the conceptual foundation and practical guidance to choose the right architecture.

👉 Missed the earlier parts?

Part 1 → Embeddings, Dimensions, and Similarity Search
Part 2 → Search Patterns, Filtering, and Chunking

AWS Vector Databases – Part 2: Search, Filtering, and Chunking

Sabarish Sathasivan — Mon, 30 Mar 2026 18:29:03 +0000

This is Part 2 of the AWS vector database series.

👉 Missed Part 1? Start here: Embeddings, Dimensions, and Similarity Search

In Part 1, we covered the fundamentals of embeddings and how similarity is measured. Now we move into how retrieval actually works in practice.

In this part, we’ll look at search patterns (KNN vs ANN), hybrid search, metadata filtering, and chunking strategies — the building blocks of effective RAG systems.

Vector Search Types

Approach	How it works	When to use	AWS services
KNN — Exact Nearest Neighbor Search	Check every single item, compare it to your query, return the best matches. Perfectly accurate, but slow.	Small datasets (under 100K vectors) or situations where you absolutely cannot afford to miss a result — like medical diagnostics or legal compliance checks.	All vector services support KNN as a fallback, but it's not practical at scale.
ANN — Approximate Nearest Neighbor Search	Uses a smart index structure (graph or cluster) to find very likely nearest neighbors without checking everything	Almost everything in production. If you're building a RAG chatbot, semantic search, or recommendation engine, this is your default.	OpenSearch Serverless, Aurora pgvector, MemoryDB, ElastiCache Valkey, DocumentDB, S3 Vectors, Neptune Analytics.

ANN Index Structures

To avoid checking every vector, ANN uses smart indexing. The two most common types on AWS are:

Index	Simple idea	Trade-off
HNSW	Connects similar vectors like a network and “walks” through it to find matches	Uses more memory and takes longer to build, but gives faster and more accurate results. Default in most AWS services.
IVFFlat	Groups vectors into clusters and only searches the closest groups	Faster to build and uses less memory, but needs tuning and may miss some results

Intuitive way to think about it

HNSW — like navigating a city with highways

Start with highways to get close
Then use local roads to find the exact place

HNSW does the same:

Moves from broad → detailed search
Finds results quickly and accurately

IVFFlat — like searching in neighborhoods

First pick a few likely neighborhoods
Then search inside them

IVFFlat works similarly:

Reduces search space
But can miss results if the right cluster isn’t picked

Which one should you use?

Go with HNSW → best performance and accuracy (default choice)
Use IVFFlat → faster to build, lower memory, but slightly less accurate

Hybrid Search

Hybrid search runs two searches at the same time—one that understands meaning (vector search) and one that looks for exact words (keyword search)—and then combines the results.

For example, a user might search: “lambda timeout issue nodejs.”

The Vector Search understands the intent (performance/debugging)
The Keyword Search ensures exact terms like lambda and nodejs are matched.

Note: The scoring method used to combine these two result sets is called Reciprocal Rank Fusion (RRF). It doesn’t simply add scores—it prioritizes documents that rank highly in both searches. For example, if a document ranks #1 in keyword search and #2 in vector search, RRF will push it to the top of the final results.

This is especially useful for enterprise RAG. Users rarely search with purely natural language or purely exact keywords—they usually mix both.

Service	Implementation
OpenSearch Serverless	Supports Native (RRF). The most robust option; its "Neural Search" feature handles the hybrid merging automatically.
Aurora pgvector	This is sql based and best for relational data; you manually combine `tsvector` (keywords) and `vector` (meaning) in one query.

Metadata Filtering

Metadata filtering narrows down results using structured data like date, category, or user ID—before or after the vector search runs.

Think of it like this: a vector search finds books similar to Harry Potter. But you only want books published after 2010 and available in English. Metadata filtering ensures you don’t waste time on the wrong results.

Pre-filtering vs Post-filtering

Approach	How it works	Trade-offs
Pre-filtering	Applies filters first, then runs vector search on the remaining data	Accurate and secure, but can be slower depending on the engine
Post-filtering	Runs vector search first, then filters the results	Fast, but may return zero results if none match the filters

Note: S3 Vectors applies metadata filters during the vector search itself, combining the accuracy of pre-filtering with the performance of post-filtering.

Chunking

Chunking is simply breaking a long document into smaller, meaningful pieces before creating embeddings. If your chunks are too small, you lose context. If they’re too big, the important meaning gets buried in noise. The goal is to find the right balance.

Common Chunking Strategies

Strategy	How it works	Chunk size	Best for
Fixed-size	Split every N tokens/characters with optional overlap	256–512 tokens	Simple content like logs or short descriptions
Recursive	Split by paragraphs → sentences → words while preserving structure	Variable	General-purpose text (default in most tools)
Semantic	Use an embedding model to split based on topic boundaries	Variable	Long-form content like whitepapers or legal docs
Document-structure	Split using headings, sections, or document layout	Variable	Structured docs like manuals, HTML, or code
Sentence-window	Store sentences, return surrounding context at query time	1 sentence (store) / window (return)	High-precision Q&A

Bedrock Chunking Options

Bedrock option	What it does	Equivalent concept
Default	~300-token chunks that respect sentence boundaries	Recursive (baseline)
Fixed-size	You control chunk size and overlap	Fixed-size
Hierarchical	Searches small chunks but returns larger context	Sentence-window
Semantic	Splits based on topic boundaries	Semantic
None	No splitting — entire file treated as one chunk	Document-structure (manual)

👉 Continue reading: In Part 3, we’ll compare AWS vector database options and build a practical decision framework to help you choose the right one.

AWS Vector Databases Part 1: Embeddings, Dimensions & Similarity

Sabarish Sathasivan — Mon, 30 Mar 2026 18:28:44 +0000

This is Part 1 of a series exploring vector databases on AWS.

We recently evaluated multiple AWS vector database options to understand their trade-offs, performance characteristics, and real-world use cases. Before comparing services, it’s important to understand the core concepts that power vector search.

In this part, we’ll cover embeddings, dimensions, and similarity search — the foundation of every RAG and semantic search system.

What Are Embeddings?

Let’s say you're building a customer support chatbot.

A user asks: “How do I change my login info?”
Your FAQ has: “Resetting your password.”

A keyword search might miss this. But as humans, we know they mean the same thing. That’s the gap embeddings solve.

An embedding is a numerical representation of content (text, image, code) where similar meaning leads to similar numbers. So even if the words differ, the intent stays close.

How Embeddings Are Created

Here's what happens under the hood when you pass a sentence to an embedding model:

"How do I reset my password?"
        │
        ▼
   Tokenization         →  ["How", "do", "I", "reset", "my", "password", "?"]
        │
        ▼
   Embedding Model       →  Neural network (e.g., Titan v2)
        │
        ▼
   Vector Output         →  [0.021, -0.438, 0.712, ..., 0.155] (1,024 floats)

The important part isn’t the numbers themselves—it’s that similar sentences produce vectors that are close to each other.

On AWS, you can generate embeddings using models like:

Titan Text Embeddings V2
Titan Embeddings G1 - Text
Cohere Embed English v3
Cohere Embed Multilingual v3

If you're getting started, Amazon Titan Embeddings V2 is a solid default—simple, cost-effective, and good enough for most use cases.

Note: The model you use for ingestion (storing data) must be the exact same model you use for inference (querying). If you embed your database using Amazon Titan but try to query it using an OpenAI embedding, the math won't align, and your search results will be complete gibberish.

Dimensions

So far, we’ve seen that embeddings are just lists of numbers representing meaning. The next question is: how many numbers are in that list? That’s where dimensions come in.

A dimension is simply the number of values in an embedding list (vector). Different models produce different dimensions:

Cohere Embed English v3, Cohere Embed Multilingual v3 → 1,024
Amazon Titan Embeddings → 1,024 (default), 512, 256

Note: Historically, more dimensions meant better accuracy but higher storage costs and slower searches. However, the game changed with Amazon Titan Text Embeddings V2.Titan v2 supports "flexible" dimensions. You can generate a 1024-dimension vector and "truncate" it down to 512 or 256.

1024 Dimensions: Maximum "nuance" and accuracy.

256 Dimensions: Up to 4x less storage cost and faster search speeds, with only a marginal hit to accuracy.

Distance Metrics: Measure Similarity

Once you have thousands of embeddings stored, the database uses a distance metric to find the "nearest neighbors" to your query.

Metric	Logic	Use Case
Cosine	Angle between vectors	The Standard. Best for text and RAG. Ignores document length.
Euclidean (L2)	Straight-line distance	Best for Images or fixed-size data where magnitude matters.
Inner Product	Direction + Magnitude	Best for Recommendations where popularity or "weight" matters.

Why Metrics Matter: The "Wordiness" Problem

Let’s compare two vectors representing the same topic:

Vector A: [1, 2, 3] (A short, concise help article)
Vector B: [10, 20, 30] (A very long, detailed whitepaper on the same topic)

Even though they cover the exact same intent, different metrics interpret them wildly differently:

Cosine Similarity (The Compass): Sees that both arrows point at the exact same target in space. It gives them a perfect match score. This is why it’s the standard for RAG—you want your short question to match a long document.
Euclidean Distance (The Ruler): Measures the physical distance between the "tips" of the arrows. Because Vector B is so much longer, the ruler sees them as miles apart and may treat them as unrelated.
Inner Product (The Spotlight): It sees that both point the same way, but it gives Vector B a "higher" score because it is stronger/longer. This is perfect for recommendation engines where you want to highlight "heavy-hitting" content.

For example,

Query A: "How do I reset my password?"
Doc B: "A guide to password resets for new users."

They mean the same thing, but Doc B is 1,000 words long.

Cosine correctly identifies them as a match because it ignores the extra "fluff" and focuses on the intent.
Euclidean might fail because the sheer volume of words in Doc B pushes its vector too far away from the short query.

Key Takeaway: For 95% of AWS text-based applications (Chatbots, Q&A, Knowledge Bases), use Cosine Similarity. It is the default in Aurora pgvector, OpenSearch, and S3 Vectors for a reason: it focuses on meaning over length.

👉 Continue reading: In Part 2, we’ll explore vector search patterns (KNN vs ANN), hybrid search, metadata filtering, and chunking — and how they impact real-world systems.

AWS Serverless Payload Limits Expand to 1 MB: What It Means for Event-Driven Architectures

Sabarish Sathasivan — Fri, 30 Jan 2026 16:43:23 +0000

AWS has been steadily increasing the maximum payload size from 256 KB to 1 MB across key serverless services. This is a meaningful improvement for event-driven architectures that rely on richer event data and reduced fragmentation.

Modern cloud applications no longer pass around small strings or simple messages. LLM prompts, telemetry signals, personalization context, ML outputs, and user interaction data are often nested JSON objects carrying real state and meaning. Until recently, fitting this data into serverless workflows required careful design tradeoffs.

Recent Announcements

AWS first introduced the 1 MB payload limit for individual services:

Amazon SQS
Amazon SQS increases maximum message payload size to 1 MiB
AWS Lambda (Asynchronous Invocations)
AWS Lambda increases maximum payload size from 256 KB to 1 MB for asynchronous invocations

1 MB Limit for Event-Driven Lambda

With the announcement - More room to build: serverless services now support payloads up to 1 MB on January 29, 2026, AWS confirmed that the 1 MB payload size limit now applies consistently to asynchronous Lambda invocations originating from Amazon SQS and Amazon Event Bus in Amazon EventBridge also.

This removes the need for several common architectural workarounds, such as:

Chunking large payloads into multiple events
S3 Claim Check Pattern - Storing payloads in Amazon S3 and passing object references in the event
Compressing data to fit within size limits

Considerations

Lambda memory and performance
Since parsing larger JSON payloads can increase both memory consumption and execution time, especially in high-throughput event-driven workloads.
Lambda timeouts still apply
The payload size may now be 1 MB, but Lambda timeouts (up to 15 minutes) and memory limits remain unchanged and should factor into your design decisions.
S3 Claim Check Pattern still matter
Storing large payloads in Amazon S3 and passing lightweight references through events remains a good choice when data exceeds size limits, is shared across consumers, or requires strong governance and traceability.
The Messaging Cost (SQS & EventBridge)
While payload limits have increased, billing is still based on 64 KB chunks. Both Amazon SQS and Amazon EventBridge meter usage per 64 KB unit. As a result, a single 1 MB (1,024 KB) event is billed as 16 requests, not one. In high-volume systems, this can significantly increase messaging costs.
The Compute Cost (Lambda Async)
For async Lambda invocations, the first 256 KB is billed as one request, with an additional request charged for every 64 KB beyond that. This means a 1 MB (1024 KB) async event is billed as 13 requests (1 base + 12 additional chunks). At scale, these hidden request multipliers can quickly erode your compute budget.

This table provides a quick reference for calculating the real cost of your payloads.

Payload Size	Billable Requests (SQS / EventBridge)	Billable Requests (Lambda Async)
Up to 64 KB	1	1
256 KB	4 (4 *64 KB)	1 (1*256 KB)
512 KB	8 (8 * 64 KB)	5 (1 * 256 KB + 4 * 64 KB)
1 MB (1024 KB)	16 (16 * 64 KB)	13 (1 * 256 KB + 12 * 64 KB)

Beyond the 29-Second Limit: 4 Patterns for Serverless GenAI on AWS

Sabarish Sathasivan — Thu, 15 Jan 2026 15:50:42 +0000

When teams start building GenAI-powered APIs on AWS, the initial architecture often looks straightforward:

It works well for demos and early prototypes. But as soon as prompts grow larger, models get heavier, or agent-style workflows are introduced, many teams hit the same invisible wall: the 29-second API Gateway integration timeout.

Over the past year, AWS has introduced several ways to address this problem. This article walks through those options, based on what actually works when you’re trying to keep GenAI APIs stable, scalable, and usable.

Option 1: Increasing the API Gateway integration timeout

In mid-2024, AWS finally allowed REST API integration timeouts to be increased beyond the long-standing 29-second limit. Amazon API Gateway integration timeout limit increase beyond 29 seconds

This sounds like the obvious fix. It requires no code changes and keeps the synchronous request-response model intact.

The Trade-offs:

The "Spinner" Problem: From a user experience perspective, you’re simply extending how long someone stares at a loading spinner.
Availability: Works only for Regional REST APIs and private REST APIs
Throttling: It might lead to a reduction in your account-level throttle quota limit

Option 2: The Asynchronous "Job" Pattern (Polling)

Sometimes, streaming isn't the right fit. If your GenAI application is generating images, creating PDFs, or running complex "Agentic" workflows that involve 2 minutes of silent "reasoning" before producing an answer, streaming text chunks provides no value.

In this pattern, API Gateway acts as a dispatcher rather than a waiter.

Request: The client sends a POST request.
Dispatch: API Gateway triggers an asynchronous process (via SQS or AWS Step Functions) and immediately returns a 202 Accepted response with a jobId.
Processing: The backend (Lambda/Bedrock) processes the request offline, unaffected by API Gateway timeouts.
Retrieval: The client polls a status endpoint (GET /jobs/{jobId}) every few seconds to check if the work is complete.

The Trade-offs:

Client Complexity: The frontend client must implement polling logic (e.g., "check status every 3 seconds").
Latency: There is inherently a small delay between the job actually finishing and the client's next poll interval catching it.
Cost: You pay for the initial request plus every subsequent polling request, which can add up if thousands of users are polling frequently.

Option 3: Use API Gateway WebSocket APIs

API Gateway WebSocket APIs remove the request-response timeout entirely by switching to a persistent, stateful connection. However, there are following things to consider.

WebSockets work best when the system is designed to communicate progress (e.g., "Thinking...", "Searching knowledge base..."), not just deliver a final answer.

The Trade-offs:

Idle Timeout: There is still a 10-minute idle timeout on WebSocket connections. If no data is sent during that window, the connection is closed.
Complexity: They introduce additional complexity in connection management, retries, and client-side state handling.

Option 4: REST API response streaming

AWS added response streaming for REST APIs in late 2025: Amazon API Gateway now supports response streaming for REST APIs

This allows Lambda to stream chunks of data back to the client as soon as they are available, rather than waiting for the entire response to be generated.

Why this is the preferred pattern for LLMs: Users see immediate feedback instead of a blank screen, even if the total execution time remains the same. This drastically improves the "Time to First Token" (TTFT) metric.

There’s a great walkthrough in Serverless Office Hours:

The Trade-offs:

Runtime Restrictions: Native streaming support is currently strongest in Node.js managed runtimes. If you use Python, Java, or .NET, you often need to implement a custom runtime or use the AWS Lambda Web Adapter to proxy the stream.
Lost API Gateway Features: Because API Gateway no longer buffers the response, it cannot modify it. You lose support for Endpoint Caching, Content Encoding (automatic GZIP compression), and VTL transformations on the response body.
Error Handling Complexity: Once your Lambda sends the first byte (usually a 200 OK header), you cannot change the status code. If the LLM hallucinates or crashes mid-stream, the API will still report "Success" HTTP status, so your client must be smart enough to parse error messages inside the text stream.
Bandwidth Throttling: For very large responses, the first 6-10 MB bursts at full speed, but remaining data is often throttled (e.g., 2 MB/s).

Summary: Which Pattern Should You Choose?

Pattern	Complexity	Best Use Case	Key Constraint
Timeout Increase	Low	Internal tools / MVPs	Users stare at a loading spinner (High TTFT).
The Asynchronous "Job" Pattern (Polling)	Medium	Image gen / Silent agents	Polling cost & delayed completion
WebSockets	High	Bi-directional Agents	Requires managing connection state & heartbeats.
Response Streaming	Medium	Chatbots & Text Gen	Node.js preferred; No API Caching or VTL.

Using RDS Proxy with a .NET Lambda for Efficient Database Connections

Sabarish Sathasivan — Sat, 04 Jan 2025 19:59:55 +0000

Efficient database connection management is critical in serverless architectures like AWS Lambda.Each Lambda invocation opens a new database connection, and under high traffic, this can quickly exceed the database's connection limits, causing performance bottlenecks and resource exhaustion.

Introduction to RDS Proxy

Amazon RDS Proxy, a fully managed AWS service, optimizes database connections for RDS databases. Acting as an intermediary between your application and the database, it pools and reuses connections, significantly improving efficiency and scalability. RDS Proxy supports a wide range of database engines, including:

MySQL
PostgreSQL
MariaDB
Microsoft SQL Server
Amazon Aurora (MySQL and PostgreSQL)

For more details including limitations, refer the following User Guide- Amazon RDS Proxy

Setting Up RDS Proxy

Sign in to the AWS Management Console and navigate to Amazon RDS Console.
In the navigation pane, choose Proxies and click Create proxy.
Configure the following settings:

Parameter	Description
Engine family	Determines the database engine family - MySQL, PostgreSQL, MariaDB, Microsoft SQL Server and Amazon Aurora (MySQL and PostgreSQL).
Proxy identifier	Name of the Proxy
Idle client connection timeout	Specify how long a client connection can stay idle before the proxy closes it. By default, this is 30 minutes
Database	List of RDS DB Instances that can be accessed through the proxy
Connection pool maximum connections	Represents the percentage of the database's maximum concurrent connections that RDS Proxy can use. If there is a single proxy with a RDS Instance, this value can be 100
IAM role	Name of the IAM role the proxy will use to access the AWS Secrets Manager secrets
Secrets Manager secrets	List of Secrets that contains the database credentials for the proxy to access the database. A proxy can have up to 200 associated Secrets Manager secrets
Client authentication type	Defines the proxy's authentication method for client connections and associated Secrets Manager secrets.
IAM authentication	Specifies whether the proxy requires, disallows, or allows IAM authentication. Applicable on for RDS for Microsoft SQL Server
Require Transport Layer Security	Enable this setting to enforce TLS/SSL for all client connections

For a detailed explanation of the parameters, refer to the article:
Creating an RDS Proxy

Click on the Create Proxy button to complete the setup.
Once created, the proxy appears under the Proxies section in the navigation pane.

Click on the proxy's name to view its details.
Use the Endpoint listed in the Proxy Endpoints section as the host in your database connection string instead of the database hostname to leverage RDS Proxy.

Creating a Database User and Adding It to the RDS Proxy

To ensure the RDS Proxy can interact with your database, follow these steps to create a database user, store the credentials, and associate them with the proxy:

Create a database user with the necessary privileges

CREATE USER 'newuser'@'%' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON database.* TO 'newuser'@'%';
FLUSH PRIVILEGES;

Store the database credentials in AWS Secrets Manager
- Navigate to AWS Secrets Manager in the AWS Management Console.
- Create a new secret containing the database username and password.
- Note the ARN of the secret for use in Lambda.

Add the secret to the RDS Proxy

Attach the following policy to the IAM role associated with the RDS Proxy to read the secret:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue"
      ],
      "Resource": "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:<secret-arn>"
    }
  ]
}

Integrating RDS Proxy in the .NET Lambda

There are multiple approaches to integrate RDS proxy into your .NET Lambda function to access an RDS/MySQL database

Database Credentials

This approach lets your Lambda authenticate to the RDS Proxy using the RDS endpoint and credentials from a secret. Follow these steps to use Database Credentials:

If the database credentials are stored in a secret, attach the following policy to the Lambda Execution Role to read the secret:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue"
      ],
      "Resource": "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:<secret-arn>"
    }
  ]
}

Update the lambda code to connect to the database using the RDS Proxy endpoint and database credentials

using MySql.Data.MySqlClient;

namespace RDSProxyTest
{
    internal class DBConnection
    {
        public async Task<MySqlConnection> OpenConnectionAsync()
        {
            var connectionString = new MySqlConnectionStringBuilder
            {
                Server = "<rds_proxy_endpoint>",
                UserID = "<database_user_name>",
                Password = "<database_password>",
                Database = "<database_name>"
            }.ToString();

            var connection = new MySqlConnection(connectionString);
            await connection.OpenAsync();
            return connection;
        }
    }
}

IAM Authentication

This approach enables your Lambda function to authenticate to the RDS Proxy using short-lived IAM tokens instead of database credentials. Follow these steps to enable and use IAM Authentication:

Enable IAM Authentication for your RDS proxy
Attach the following policy to the Lambda Execution Role for the Lambda to access the RDS Proxy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue"
      ],
      "Resource": "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:<secret-arn>"
    }
  ]
}

Replace:

Parameter	Description
REGION	Your AWS region (e.g., us-east-1)
ACCOUNT_ID	AWS Account ID
RDS_PROXY_IDENTIFIER	The unique identifier for the RDS Proxy . For ex: if the arn of the RDS Proxy is arn:aws:rds:REGION:ACCOUNT_ID:db-proxy:prx-XXXXXXXXXXXXXXXXX, then value of RDS_PROXY_IDENTIFIER is prx-XXXXXXXXXXXXXXXXX
DB_USERNAME	A database username, configured in one of the secrets attached to the RDS Proxy

Update the lambda code to connect to the database using the RDS Proxy endpoint and database credentials

using Amazon;
using Amazon.RDS.Util;
using Amazon.Runtime;

using MySql.Data.MySqlClient;

namespace RDSProxyTest
{
    internal class DBConnection
    {
        private static async Task<string> GenerateAuthTokenAsync(string endpoint, string username, string region)
        {
            var regionEndpoint = RegionEndpoint.GetBySystemName(region);

            // Use default credentials of the Lambda environment
            var credentials = FallbackCredentialsFactory.GetCredentials();

            // Generate the authentication token
            return await RDSAuthTokenGenerator.GenerateAuthTokenAsync(
                credentials,
               regionEndpoint,
                endpoint,
               3306,
                username


            );
        }
        public async Task<MySqlConnection> OpenConnectionAsync()
        {
            var token = await GenerateAuthTokenAsync("<rds_proxy_endpoint>", "<database_user_name>", "<aws_region>");
            var connectionString = new MySqlConnectionStringBuilder
            {
                Server = "<rds_proxy_endpoint>",
                Database = "<database_name>",
                UserID = "<database_user_name>",
                Password = token,
                SslMode = MySqlSslMode.Required
            }.ToString();

            var connection = new MySqlConnection(connectionString);
            await connection.OpenAsync();
            return connection;
        }
    }
}

Database Credentials versus IAM Authetication

Feature	Database Credentials	IAM Authentication
Authentication Method	Uses database username and password	Uses short-lived IAM tokens (15 minutes) generated by AWS for secure authentication.
Database Support	Works with all databases supported by RDS Proxy.	Supported for Amazon Aurora (MySQL /PostgreSQL), MySQL, PostgreSQL, MariaDB. It is not supported for Amazon RDS for Oracle, Amazon RDS for SQL Server
Performance Impact	Minimal Overhead	Slightly higher overhead due to the need to generate and validate IAM tokens.

Fine-Tune Your Serverless REST APIs with AWS Lambda Power Tuning

Sabarish Sathasivan — Thu, 28 Nov 2024 13:30:55 +0000

Developing serverless REST APIs with API Gateway and AWS Lambda is now a common practice. With Amazon extending the API Gateway timeout beyond 29 seconds), serverless REST APIs can now handle complex workflows like long-running machine learning predictions and Generative AI tasks.

In this article, we’ll explore how to leverage the AWS Lambda Power Tuning tool to optimize serverless REST APIs (API Gateway configured with proxy integration + AWS Lambda) for both performance and cost efficiency.

AWS Lambda Power Tuning

The AWS Lambda Power Tuning tool is an AWS Step Functions-based state machine designed to test Lambda performance under various memory configurations. It helps optimize for cost or performance (or a balance of both) and is compatible with any Lambda runtime.

Getting Started

To deploy the AWS Lambda Power Tuning tool, follow the instructions in the deployment guide. Once deployed, the state machine will appear in AWS Step Functions, as shown below:

Executing the Tool

When executing the state machine, you can customize several parameters. Below is a summary:

Parameter	Description
lambdaARN	Required. ARN of the Lambda function to optimize.
num	Required. Number of invocations per power configuration (min: 5, recommended: 10–100).
powerValues	Optional. Memory values to test (128MB–10,240MB). Defaults to values set at deployment.
payload	Optional. Request payload for the API. Can support a static payload for every invocation or a payload from a list with relative weights
payloadS3	S3 object location for payloads >256KB.
parallelInvocation	Runs all invocations in parallel if set to true (default: false).
strategy	It can be "cost" or "speed" or "balanced"; if you use "cost" the tool will suggest the cheapest option, while if you use "speed" the state machine will suggest the fastest option. When using "balanced" the state machine will choose a compromise between "cost" and "speed" according to the parameter "balancedWeight".
balancedWeight	Parameter that represents the trade-off between cost and speed. Value is between 0 and 1, where 0.0 is equivalent to "speed" strategy, 1.0 is equivalent to "cost" strategy. Default :0.5
preProcessorARN	ARN of a Lambda function to run before each invocation of the target function.
postProcessorARN	ARN of a Lambda function to run after each invocation of the target function.
includeOutputResults	Includes average cost and duration for each configuration in the final output (default: false).
onlyColdStarts	Forces all invocations to be cold starts

Refer to the official official documentation. for detailed explanations.

Example Input

{
  "lambdaARN": "<arn of the function being executed>",
  "powerValues": [ 128, 256, 512, 1024, 1536, 2048, 2560, 3072],
  "num": 10,
  "strategy": "speed",
  "payload": {...},
  "parallelInvocation": true,
  "includeOutputResults": true,
  "onlyColdStarts": true
}

Input Payloads for Proxy Integration

Inputs to test Lambda functions behind API Gateway can vary by HTTP method. Below are sample payload links for common methods:

HTTP Method	GitHub URL
POST	Click for sample input
PUT	Click for sample input
GET	Click for sample input
GET (With Path Parameters)	Click for sample input
GET (With QueryString)	Click for sample input
DELETE	Click for sample input
PATCH	Click for sample input

Weighted Payloads

The tool also offers the option to define multiple payloads for HTTP methods, making it suitable for scenarios where payload structures vary significantly and can impact performance or speed. Refer Weighted Payloads in official documentation to understand how weighted payloads work

HTTP Method	GitHub URL
POST (With Weighted Payloads)	Click for sample input

Pre/post-processing functions

The tool also provides the ability to run custom logic before and after the execution of the lambda function. This logic should be implemented as lambda functions. Refer Pre/Post-processing functions in official documentation to understand how weighted payloads work

HTTP Method	GitHub URL
Post (With Pre/Post functions)	Click for sample input

Output

A sample execution output is shown below:

{
  "output": {
    "power": 2048,
    "cost": 0.0000018816000000000001,
    "duration": 54.95933333333334,
    "stateMachine": {
      "executionCost": 0.00075,
      "lambdaCost": 0.0013002423000000002,
      "visualization": "https://lambda-power-tuning.show/#encodeddata"
    },
    "stats": [
      {"value": 128, "averagePrice": 9.345e-7, "averageDuration": 443.8995},
      {"value": 2048, "averagePrice": 0.0000018816, "averageDuration": 54.9593}
    ]
  }
}

A brief description of the output is given below

Key	Description
output.power	The optimal memory configuration (RAM).
output.cost	The corresponding average cost (per invocation).
output.duration	The corresponding average duration (per invocation).
output.stateMachine.executionCost	The AWS Step Functions cost corresponding to this state machine execution (fixed value for "worst" case).
output.stateMachine.lambdaCost	The AWS Lambda cost corresponding to this state machine execution (depending on number of executions and average execution time).
output.stateMachine.visualization	A URL to visualize and inspect average statistics about cost and performance. Note: Average statistics are NOT shared with the server, as all data is encoded in the URL hash, client-side only.
output.stats	The average duration and cost for every tested power value configuration. Only included if `includeOutputResults` is set to a truthy value.

Visualizing the output

The element - output.stateMachine.visualization provides a visualization URL - https://lambda-power-tuning.show/#encodeddata that can be used to visualize the result.

The source code of the UI is also open source - https://github.com/matteo-ronchetti/aws-lambda-power-tuning-ui

The outputs of POST / GET (With Path Parameters) are shown below

POST

GET (With Path Parameters)

The tool includes a feature to compare the results of function invocations. To see how this functionality is applied in practice, check out the article Secrets Management in .NET Lambda, where we demonstrate its use to compare the performance of reading secrets in a .NET Lambda.

References

Secrets Management in .NET Lambda

Sabarish Sathasivan — Tue, 12 Nov 2024 22:06:55 +0000

Typically, an application must deal with sensitive information like API keys, database credentials, etc. A recommended approach is to store this sensitive information as secrets using the Secrets Manager Service in AWS. This article will explore multiple approaches to retrieving and caching secrets in a. NET-based Lambda.

AWS SDK Based Approach

AWS provides the following package - AWSSDK.SecretsManager.Caching that can be used to retrieve and then cache the secret for future use.

To use the caching library, add the package AWSSDK.SecretsManager.Caching to your .NET Lambda

dotnet add package AWSSDK.SecretsManager.Caching

The following code snippet shows a method that retrieves a secret and cache it for 15 minutes

using Amazon.SecretsManager.Extensions.Caching;

namespace Lambda.Secrets.AWSSDK
{
    public class SecretsProvider : ISecretsProvider
    {

        //Set the cache item ttl to 15 mins
        private static SecretCacheConfiguration _cacheConfiguration = new SecretCacheConfiguration
        {
            CacheItemTTL = 900000
        };

        private SecretsManagerCache _cache = new SecretsManagerCache(_cacheConfiguration);


        public async Task<string> GetSecretAsync(string secretName)
        {
            return await _cache.GetSecretString(secretName);
        }
    }
}

To reuse a secret across the lifetime of a Lambda execution environment (i.e., while the container is warm), inject the secret-retrieving class as a singleton.

Using AddSingleton ensures the class is instantiated only once per warm environment, allowing the secret to persist across multiple invocations without re-fetching. This approach reduces overhead, improves performance, and minimizes calls to AWS Secrets Manager.

AWS Parameters and Secrets Lambda Extension

AWS provides an extension - AWS Parameters and Secrets Lambda Extension.This extension can be installed as a Lambda Layer and acts an in-memory cache for parameters and secrets.

Lambda Layers

Lambda layers provide a convenient way to manage reusable code and dependencies across multiple Lambda functions. A layer is essentially a ZIP archive that can include libraries, custom runtimes, or other necessary files. By using layers, you can avoid bundling these resources directly in your function's deployment package, reducing its size and improving maintainability.

How the Extension Works

The extension exposes a http endpoint (http://localhost:2773) to the lambda
When a secret is requested
- The extension first checks the cache for an existing entry.
- If the entry is not found, the value is retrieved from the Secrets Manager, cached with a TTL (default 300 seconds) and value is returned
- If an entry is found, it verifies the time elapsed since the entry was added to the cache. If the elapsed time is within the configured cache TTL (time-to-live), the entry is returned from the cache, otherwise fresh data is fetched from the Secrets Manager, cached and returned.

Refer the article - Using the AWS Parameter and Secrets Lambda extension to cache parameters and secrets to understand the architecture of the extension.

Add the Layer to Your Lambda Function

Open the AWS Lambda Console and navigate to your function.
Under the Layers section, click Add a layer.
- Select the option AWS layers
- Select "AWS-Parameters-and-Secrets-Lambda-Extension" and the latest version
- Click Add.
The layer will be added to your Lambda

Configuring the Extension

The extension can be configured using environment variables. Some important configurations are listed below

SECRETS_MANAGER_TTL: TTL of a secret in the cache in seconds. Must be a value e between 0 and 300. The default is 300
PARAMETERS_SECRETS_EXTENSION_CACHE_SIZE: The maximum number of secrets and parameters to cache. Must be a value from 0 to 1000. Default is 1000

Refer the section - AWS Parameters and Secrets Lambda Extension environment variables in the following article to get the complete list of environment variables.

The following code snippet shows a method that retrieves a secret using the extension

namespace Lambda.Secrets.Extension
{
    public class SecretsProvider : ISecretsProvider
    {
        private readonly HttpClient _httpClient;

        private readonly string GetSecretsEndpoint = "/secretsmanager/get?secretId=";

        public SecretsProvider(HttpClient httpClient)
        {
            _httpClient = httpClient;
        }

        public async Task<string> GetSecretAsync(string secretName)
        {
            var httpRequest = new HttpRequestMessage(
                HttpMethod.Get,
                new Uri($"{GetSecretsEndpoint}{HttpUtility.UrlEncode(secretName)}", UriKind.Relative));

            //Pass X-Aws-Parameters-Secrets-Token as a header. This is a required header that uses the AWS_SESSION_TOKEN value,
            //which is present in the Lambda execution environment by default. 
            httpRequest.Headers.Add("X-Aws-Parameters-Secrets-Token",
                Environment.GetEnvironmentVariable("AWS_SESSION_TOKEN")
            );
            var response = await _httpClient
                .SendAsync(httpRequest)
                .ConfigureAwait(false);

            response.EnsureSuccessStatusCode();
            var responseAsJson = await response.Content.ReadFromJsonAsync<GetSecretValueResponse>();
            return responseAsJson!.SecretString;
        }
    }
}

The class can be injected into the Lambda as given below

builder.Services.AddAWSLambdaHosting(LambdaEventSource.RestApi)
           .AddHttpClient<ISecretsProvider, SecretsProvider>(c =>
        {
            c.BaseAddress = new Uri("http://localhost:2773");
        });

Testing Approach

AWS Lambda Power Tuning was used to test the performance of the Lambda functions.

Lambda functions were with the following memory configurations: 128 MB, 256 MB, 512 MB, 1024 MB, 2048 MB, 2560 MB,3072 MB.
Each configuration was invoked 100 times.
Lambda invocations were done in parallel and had a combination of cold starts and warm starts.

AWS Lambda Power Tuning measured execution time and calculated the associated costs, providing insights into performance improvements.

Test Results

The test indicates that lambda that retrieves the secret using Lambda Extension is always more performant and cheaper than the lambda that retrieves the secret using AWS SDK.

Execution Time

Memory Allocation	AWS Secrets SDK (ms)	Secrets Manager Extension (ms)
128 MB	10738	6171
256 MB	5155	3189
512 MB	1762	1391
1024 MB	934	716
1536 MB	657	191
2048 MB	398	285
2560 MB	370	310
3072 MB	446	115

Cost

Memory Allocation	AWS Secrets SDK ($)	Secrets Manager Extension ($)
128 MB	0.00002127	0.00001172
256 MB	0.00001919	0.00001089
512 MB	0.00001136	0.00000797
1024 MB	0.00000955	0.00000609
1536 MB	0.00000856	0.00000204
2048 MB	0.00000658	0.00000369
2560 MB	0.00000764	0.00000512
3072 MB	0.00001083	0.00000246

The source code for the Lambdas is shared at SecretManagementInDotnetLambda