Forem: Nicola Cremaschini

Spec-Driven Prototyping with Amazon Q and Q-Vibes Memory Banking framework

Nicola Cremaschini — Mon, 21 Jul 2025 06:00:52 +0000

From ideas to prototypes

We all love when an idea hits sharp, exciting, half-formed. But getting from that spark to something tangible often involves friction: scaffolding, repetition, boilerplate.

That overhead can kill momentum.

Prototyping is how we protect the idea. Its the creative phase where we validate assumptions, test viability, and explore possibilities quickly.

When done well, prototyping accelerates innovation.

Prototyping is a preliminary phase of product development and has its own objectives and constraints that differ from those of other phases of product development.

In prototyping, you don't yet have a clear idea of the end product. You don't even know if the idea is really good enough to become a product.

For this reason, we can define these attributes/constraints for prototypes:

They should be cheap. Their realisation should take little money and time, therefore they should be developed by one or two people and not by a whole product team. You must not invest a lot of time in requirements gathering, design, development, etc.
You don't have to share it with a global audience: You can present it to investors, but while you're driving the presentation. A prototype is neither a demo nor a product preview, so you don't have to make it available to others. This means you don't have to deploy it, just run it in a closed environment (maybe your local environment).
Perhaps neither real data nor real integrations are needed if the idea can be explored without them.
It's a throwaway : you don't need to extend it to production development. Once the idea has been explored and validated, throw away your prototype and start defining and developing your product.

With generative AI tools such as Amazon Q, Claude or Cursor, prototyping is faster than ever before. This is where vibe coding comes into play.

Some might think that vibe coding just means that the AI generates random code that you don't understand and that it's not real engineering: that may be true, a drum kit can also just be used to make noise (as my neighbour says).

Nowadays, vibe coding is often presented as in contast with spec-driven development. In my humble opinion, we could take advantage of vibe coding by directing agents providing them just enough specs for the goal.

As an engineer, I don't need perfect code when prototyping, but I don't want random code either.

Context matters

Generative AI is deeply contextual every response depends on what came before. Even small shifts in input can produce wildly different output. Thats both a strength and a weakness.

When you're coding by vibe, the AI doesnt truly "know" your intent it infers it. Without clear, consistent context, things go off-track fast:

You change a prompt slightly, and the AI drops half the logic.
A missed constraint (like region or tech stack) leads to subtle regressions.
A well-intentioned refactor undoes previous alignment.

Context matters not just in what you say, but what the AI sees every time. Thats why a reusable, declarative context is so powerful: it's about creating a shared space where your intent lives.

Structure it, store it, and tell the AI how to handle it and youve got spec-driven development.

The memory problem

LLMs are brilliant, but forgetful. Especially across sessions or as prompts grow. When prototyping, this causes friction:

Goals get redefined without you noticing
Tech stack or constraints subtly change
Errors repeat because nothing was "remembered"

The deeper the session, the worse the drift. At some point, youre no longer prototyping youre re-aligning.

This is where spec-driven prototyping enters.

Writing a simple spec, just a lightweight set of goals, guardrails, and preferred stack helps anchor the AIs responses. It makes collaboration reproducible.

Combine that with a way to update and feed this spec consistently to your assistant and youve got memory banking.

Introducing Q-Vibes memory banking framework

In my last article, I reported on my direct experience of building my Think-O-Matic prototype with Amazon Q, gave a definition of vibe coding and briefly introduced the concept of memory banking.

After this experience, I developed a memory banking framework specifically for rapid prototyping with Amazon Q: It's open-source (contributions are welcome!) and you can find it on github.

To use the framework you need:

an idea to explore
Amazon Q (both CLI or IDEs plugin)

The framework consists in specifications (provided to agent by .md files) and prompts (provided to agent by you via chat).

Specifications

Specs consists in 5 MD files:

q-vibes-memory-banking.md - the AI Contract. This contains the complete framework instructions that tell the AI how to work with memory banking when initiating a new session, resuming a session and updating docs at the end of an iteration. This is provided, no need to edit it.
idea.md: Captures the core concept and success criteria for your prototype. This is your north star - created once and rarely changes. The AI creates this from your initial description, but may ask clarifying questions to complete all sections using the template structure.
vibe.md: Defines how you want to collaborate with the AI assistant. Specifies your interaction style, tech stack preferences, decision-making approach, git workflow, security practices, documentation requirements, and speed vs quality trade-offs. You have to create and mantain it.
state.md: The living technical snapshot of your prototype. Updated frequently by the AI as you build. Contains current stack, architecture overview, file structure, what's working/broken, immediate next steps, and current focus.
decisions.md: Log of key choices made during development. Prevents re-discussing the same decisions. The AI creates and maintains this file as architectural and technical decisions are made, following the template structure.

This makes the framework complete and self-contained. The AI gets both:

How to work (from the framework instructions in q-vibes-memory-banking.md)
What to work on (from the 4 context files: idea.md, vibe.md, state.md, decisions.md)

Usage

Let's say you have a brilliant idea and you want to explore it.

All you need is to:

create a project folder.
create a .amazonq/vibes sub-folder and copy templates inside it.
create your vibes.md, you can start from the template or the provided example.
prompt your idea and clarify it with the agent.

Your prompt should be something like:

Hi! I want to start a new prototype using Q-Vibes Memory Banking.
Please read the framework instructions in .amazonq/vibes/q-vibes-memory-banking.md first to understand how to work with this system.
My prototype idea: [Describe your idea here - can be brief, just the core concept]

The agent would ask you for further clarification and confirmation of his assumptions, with the intention of narrowing and clarifying the scope.

Resuming a session is even easier. Just prompt the agent with a simple request to pick up where you left off. Something like this:

Hi! I'm resuming work on my prototype using Q-Vibes Memory Banking.
Please read the framework instructions in .amazonq/vibes/q-vibes-memory-banking.md first, then read all the context files in .amazonq/vibes/ folder to understand the current state.
Once you've reviewed everything, please confirm what we're building, where we left off, and what the next steps should be.

Please read the README.md of the project for a quick setup, complete instructions and a running example.

Note that you and the agent are jointly responsible for ensuring that the specifications are clear and match. There is no magic here: the better the input, the better the output.

Benefits:

The key benefits are:

it is very fast: i created the example provided in less than 1 hour, while also testing session resuming.
you provide guard rails: not arbitrary code, but code that suits your needs and style
AI helps you explore your idea: In my experience, the agent's questions helped narrow down my ideas better
no loss of context: you don't have to provide context to the agent
it's very fast: I created the example in less than 1 hour and also tested the resumption of sessions.

Prototype Memory Product Memory

You might be asking yourself:

Why do we need a specific framework? Why not just use a full spec-driven development framework?

Because prototyping has different goals.

Its not just an early phase of development its a different mode entirely.

In the prototyping phase, youre optimizing for speed, creativity, and cost-efficiency not for durability, scalability, or perfect accuracy. You want to explore, validate, and iterate quickly. That means you can (and should) tolerate some messiness and manual steps, as long as they accelerate learning.

Thats why the memory needs during prototyping are also different.

You dont need a persistent, multi-session memory graph. You need just enough structure to help your AI collaborator stay aligned through a rapid, idea-driven loop.

This framework isnt built for production agents or end-user memory systems. Its not meant to manage complexity across months or teams.

Instead, its designed for that middle space between a blank prompt and full-stack dev where ideas are still forming, and flow matters more than polish.

If youre in that zone, a lightweight memory bank gives you:

Direction without rigidity
Consistency without ceremony
Momentum without drift

Conclusions

Vibe-coding is here to stay and its magical when it works. But even vibes need a spine.

This lightweight memory banking approach gives you structure just enough to stay aligned, while keeping the creative momentum alive.

If youre prototyping with Amazon Q or any other Agent / LLM, give it a try.

The framework is tailored on Amazon Q, but not bounded to it.

Darshit Pandya (you can find him also on LinkedIn) created a version tailored on Github Copilot, and we are going to benchmark the framework against the two agents to measure its performance and collaborate to improve it.

Check out the framework, fork it, remix it, build something weird, share it.

Because vibes are better when they remember what theyre building.

Specs, just enough.

Building Think-o-matic: A Vibe-Coding Journey with Amazon Q

Nicola Cremaschini — Sun, 22 Jun 2025 22:39:16 +0000

Weve all had those moments where inspiration strikes, but the traditional coding workflow planning, scaffolding, testing, debugging feels like too much friction.

What if instead, you could prototype with a different mindset? One that prioritizes momentum, creativity, and just enough structure to explore an idea?

Thats where vibe-coding comes in.

This unconventional approach flips the traditional dev cycle on its head. You dont manually write every line or carefully craft a layered architecture instead, you describe what you want , and let AI tools do the heavy lifting.

Vibe-coding isnt about writing perfect code. Its about describing intent, trusting the process, and shipping fast.

Vibe-coding: what it is and why it matters

This term comes out from this Andrej Karpathys X post, just few weeks ago, and the industry is trying to converge on a definition and to standardize it.

Few weeks later, distinguished authors are writing and publishing books about this techniuqe, just the mention a few

I tried to summarize the key points of vibe-coding from Karpathys post.

Vibe-coding is:

Letting AI tools write, fix, and modify the code.
Embracing feel and flow over full code comprehension.
Typing as little as possible mostly just describe, accept, and run.
Skipping diffs, skimming errors, and trusting AI suggestions.
Supervising the AI rather than driving every keystroke.

This is not traditional coding. Its prototyping for the AI-native era perfect for weekend projects, experiments, or validating ideas before investing in full-scale development.

If you ever had some idea about building something on your own, it is very clear why it matters: it makes prototyping fast, cheap, and easy.

A Quick Reminder: Whats Prototyping?

In software engineering, prototyping is about:

Creating a preliminary version of a system to explore ideas, validate functionality, and gather user feedback before full-scale development.

Its low-commitment, fast-paced, and feedback-driven which makes it the perfect playground for AI-powered workflows.

Fast-prototyping steps:

I have tried to define some steps to structure my vibe coding sessions:

Have an idea: this seems obvious, but it's not. You can create something if you have an idea that is clear enough to be built and executed, but also leaves some room for exploration.
Set up tools: You need a toolbox that is easy to set up, quick, cheap and that you trust. In these times, I don't think it's worth spending too much time finding the perfect tools or optimizing them: Your perfect tool could be obsolete tomorrow.
Describe your idea: This means you tell your tools what you want to build together.
Follow the vibes: This step is actually an inner loop consisting of three sub-steps:
Enough is enough: you are building a prototype, not a product: this means you are not looking for perfection nor a complete system.

i have schematized these steps like this:

A Real Example: Building Think-o-matic

To make this more tangible, heres how I prototyped a tool called Think-o-matic an AI-powered copilot to help structure workshops, generate agendas, create Miro boards, and summarize outcomes into actionable Trello tasks.

Having an idea

My ideas comes out most of times from real-life problems that i cant fix. Im often wondering if i could build something to help me, and this helps me in three ways:

First, i love build, i find it fun.

Second, i go deep in my problem understanding: if you want a solution, you have to taget you problem.

Third, i got the problem solved! One less

This is exactly what happend with Thinlk-o-Matic. In my current job role i need to put stakeholders around a table, often virtual, and making them working togheter to target problems, find solutions and explore ideas. In other words, i need to extract information from them, and one effective way to do this is to running workshops.

Running a workshop involves the following steps, and that is exactly what i need help on:

Between steps 3 and 4 there is the workshop itself.

I also draft a high-level architecture of the prototype:

A front-end webapp backend by an ExpressJS server running on nodejs environment, that provides integration to miro and trello, and to Amazon Bedrock to provide intelligence to the system: Amazon Nova would generate the workshop agenda and summarize the Miro Board.

Set up tools

My toolbox is very easy: terminal, Amazon Q CLI (agentic) backed by Claude Sonnet 4.0. That is.

Describe your idea

I used a greatly simplified version of a technique called memory banking that I learned from this blog post from Cline.

In a few words, its a way to remember what's going on in the project and between you and the agent, because otherwise it would easily forget whats going on and have to recreate the context for your agent, leaving the vibes.

Creating a memory bank helps your AI:

Stay aligned with goals across sessions.
Remember decisions already made.
Reduce repetition and confusion.

In vibe-coding, memory banking becomes your anchor keeping prototypes from drifting too far off course.

I have provided two files:

Prototype Guidelines: Instructions on what is a prototype, what is not, and how I want to build prototypes. This prompt does not refer to a specific prototype and is reusable

Think-o-matic specific guidelines: Instructions about this specific idea.

The prompt style is a mix of the Risen framework (role, input, steps, expectation, narrowing) and the Rodes framework (role, objectives, details, examples, sense check).

Again, I don't want to spend too much time on prompting, and of course I used LLMs to write my prompts.

My session started with these two files in a folder and a little prompt that went something like this:

before doing anything, read these two files and tell me what you think about. Please use the same folder to create your checkpoint files.

In this way i also instructed the agent to update the memory bank while going on with the work.

Follow the vibes + Check Results + Make friends

After this little prompt, the agent red the specs i provided and proposed me an action plan. We agreed on the steps and to check-in with me after every step.

The first iteration result was this:

Basically the first iteration was the working app with all integration mocked, in about 15 minutes.

What Went Well vs. What Went Weird

Good:

AI nailed the folder structure and scaffolding.
The prototype ran with minimal setup.
I stayed in the creative zone.

Bad:

Wrong AWS region.
No documentation, even if i asked for.
Laughably bad UX.
A few silly bugs.

But thats okay vibe-coding isnt about perfection. Its about fast feedback and learning by doing.

Making Friends: Tune Your Cooperation Style

Vibe-coding isnt autopilot. Youre not giving up control youre adjusting how you cooperate.

I think of this part as making friends with the AI. Like any relationship, it needs clear communication and trust but also healthy boundaries.

Heres a real example: I forgot to specify the AWS Region in a prompt. The AI defaulted to us-east-1 (no idea why). I needed a Bedrock model that was only enabled in eu-west-1. Instead of asking me, the agent silently changed the model to something available in us-east-1.

Thats when I stepped in. I told the agent:

For small things, go ahead. But for big architectural decisions ask me.

That balance is key. You want the AI to be proactive, but aligned. Let it move fast just not in the wrong direction.

The final result: think-o-matic

I worked with the agent for a few hours: We added one feature after another: Agenda creation, Miro Board creation, Miro Board summary, Trello integration.

For each feature implemented, Q updated the memory bank.

The result? You can try it out for yourself by running it locally.

Here is the github repo with the code and instructions to run it.

If you look at the repo, you might wonder why there is only one branch and one commit: I created it as a private repo and before I made it public, I searched it for secrets and found that Q wrote my secrets to the memory bank. Trust, but verify.

Final Thoughts

This is just from my own notes i took after those two hours:

do	don't
State clear goals	Over-engineer
Define wont do	Forget the code exists
Use memory banks	Ask for endless validation
Work in small chunks	Force AI to stick to one approach
Create checkpoints	Ignore drift, it happens!
Tune your cooperation style	Expect the AI to guess your intent

Pro tip: Let the AI drift a bit. Sometimes, the best ideas emerge sideways.

Waiting for the Doom Moment

A few weeks ago, I had the pleasure of leading a roundtable discussion with Jeff Barr, Chief Evangelist for AWS and one of the most influential engineers in software engineering and cloud computing, and I asked him about the future of Gen AI. I asked, whats next?

He responded with a story from 1992 - 1993

in 1992, we all loved Wolfstein 3D.

Despite the name, it was not real 3D, and it was the first in-person shooter game.

A year later, John Carmack developed the Doom Engine using the same technology, and we were all shocked by Doom

Thats where we are with AI and prototyping right now.

Were still building Wolfensteins.

But Doom is coming.

Building Atomic Counters with Amazon DocumentDB

Nicola Cremaschini — Mon, 17 Mar 2025 10:34:57 +0000

Introduction

This is the final installment in my atomic counters series where I explore different distributed databases and how they implement atomic counters.

This time, were looking at Amazon DocumentDB a managed NoSQL document database, MongoDB-compatible and optimized for AWS.

Atomic counters are a common requirement in distributed applications, whether for tracking views, managing inventory, or implementing rate limiting.

In this article, well discuss how DocumentDB handles atomic updates and explore a working example from my GitHub repository.

Serializability and Linearizability in DocumentDB

Before we can dive deep into code, we need to recall few concepts (please refer to the first article of this series for a detailed explanation )

Serializability : Operations appear in a consistent sequential order, ensuring correctness.
Linearizability : Writes are immediately visible for subsequent reads, ensuring real-time consistency.

DocumentDB achieves linearizable writes through its single-primary, multi-replica architecture :

Write operations are directed to the primary instance , and changes are asynchronously replicated to secondaries.

Read operations from the primary always return the latest committed value , ensuring linearizability.

Replica reads may return stale data due to replication lag, meaning they are eventually consistent.

This guarantees that atomic updates within a single document, like counters using the $inc operator, remain correct and isolated.

While not required in this specific scenario, it is worth to mention DocumentDB supports:

Replication and Leader Election in DocumentDB

DocumentDB automatically replicates data across multiple availability zones to ensure durability and availability.

Key mechanisms include:

Single-primary replication : A single primary instance handles writes, while replicas asynchronously replicate data and serve read requests.

Leader election : If the primary instance fails, DocumentDB automatically promotes a replica to primary, minimizing downtime and maintaining availability.

This replication strategy allows DocumentDB to scale reads across replicas while ensuring that writes remain strongly consistent on the primary.

However, applications must account for eventual consistency when reading from replicas due to asynchronous replication.

The Atomic Counter Pattern

The atomic counter pattern enables precise increment operations, even in distributed environments.

With DocumentDB, you use the $inc operator , which atomically increments a numeric field within a document.

This ensures that concurrent increments are safely serialized without race conditions.

DocumentDB supports conditional increments natively : you can use $inc with $cond in an update operation to increment the counter only when certain conditions are met all in a single atomic operation.

This makes DocumentDB a good choice when you need both unconditional and conditional increments , ensuring correctness without requiring complex client-side logic.

Hands-on! Walkthrough of the Deployable Example

Lets examine how the deployable example in this GitHub repository

This example demonstrates how to implement an atomic counter using AWS Lambda , API Gateway , and DocumentDB.

API Gateway : Provides HTTP endpoints for interacting with the counter.
Lambda Functions : Implements the business logic for incrementing the counter.
DocumentDB : Stores the counters.

In my example project you can decide wheter to use a maximum value for the counter or not: this determine if use or not conditional writes.

Lets focus on lambda business logic, from the docDbCounterLambda code:

const documentDBClient = await buildDocumentDbClient();
await documentDBClient.connect();
const countersCollection = documentDBClient.db("atomic_counter").collection('counters');
const updateFilter = getUpdateFilter(useConditionalWrites, id, maxCounterValue);
const updateResult = await countersCollection.updateOne( updateFilter, { $inc: { atomic_counter: 1 } }, { upsert: true, });

Here I use the $inc operator with the upsert flag set to true: this makes the method work both the first time, when the counter does not exist and is therefore initialized to zero, and for further increment operations.

What changes between conditional and unconditional write operations is the updateFilter returned by the getUpdateFilter method.

Lets have a look at it:

const getUpdateFilter = (useConditionalWrites: boolean, id: number, maxCounterValue: string) => { 
const unconditionalWriteParams = { counter_id: id } 
const conditionalWriteParams = { counter_id: id, $and: [{ atomic_counter: { $lt: Number(maxCounterValue) } }], } 
return useConditionalWrites ? conditionalWriteParams : unconditionalWriteParams;
}

For unconditional writes, the only filter is the counter_id attribute.

For conditional writes, the construct $lt (lower than) is added as an additional condition to check whether the value is below the maximum value.

Since the update is performed for a single document and the increment operation is performed on the server side, atomicity is guaranteed and the counter value cannot exceed the maximum value

Trade-Offs and Conclusion

Like other databases in this series, DocumentDB comes with trade-offs when used for atomic counters:

Strenghts:

MongoDB Compatibility : Developers familiar with MongoDB can reuse existing knowledge.
Managed Scaling : AWS handles replication, backups, and failover.
Atomic Updates on a Single Document : $inc ensures updates are atomic.

Limitations:

Eventual Consistency for Replicas : Secondary reads may return stale data.
Higher Latency for Stronger Consistency : To ensure fresh data , queries must be sent to the primary instance.

Key Takeaways:

Atomic counters in DocumentDB can be implemented using the $inc operator, ensuring atomic updates at the document level.
Conditional increments are fully supported using $inc combined with $cond , allowing for server-side enforcement of constraints.
DocumentDB follows a single-primary, multi-replica model , meaning writes are strongly consistent , but replica reads may be eventually consistent.
Automatic leader election ensures high availability by promoting a replica to primary in case of failure.

You can find the full runnable example in my GitHub repository: atomic-counter.

This marks the end of the atomic counter series! 🚀

Building Atomic Counters with TiDB

Nicola Cremaschini — Sun, 16 Feb 2025 18:00:04 +0000

Distributed SQL databases have become a cornerstone for applications that require global scalability and strong consistency , and this problem has existed since the very first deployment of a database on two distinct servers: how to achieve strong consistency and scalability without compromising availability?

Is it possible?

The CAP theorem states that this isn't possible: if you consider consistency, availability and partitioning as fundamental properties of data storage, you can only choose two out of three properties.

In this fourth part of the series on atomic counters, we'll explore how the pattern can be implemented using TiDB, NewSql's database class, as an example, focusing on global partitioning, strong consistency and high availability.

This article will provide a closer look at TiDBs unique architecture, discuss trade-offs, and refer to a practical implementation found in this GitHub repository.

Serializability, Linearizability, and TiDB

TiDB guarantees strong consistency across its distributed nodes by adopting a two-phase commit (2PC) protocol. This ensures that all transactions, including atomic increments, are serialized and linearizable.

To provide high availability, TiDB replicates data using Raft, a consensus algorithm that ensures data consistency across regions. This makes TiDB well-suited for use cases requiring globally consistent counters.

Replication and Leader Election in TiDB

TiDBs replication model is built on Raft , where each region has a leader and multiple followers.

The Raft leader handles writes and ensures consistency through consensus.

Followers replicate data for high availability and enable failover in case of leader failure.

This replication mechanism ensures that even in multi-region deployments, TiDB maintains consistency and availability.

The Atomic Counter Pattern

The atomic counter pattern ensures precise, consistent counter increments even in distributed environments.

With TiDB, you can achieve this using SQL transactions and atomic operations like UPDATE ... SET.

Hands-On! Walkthrough of the Deployable Example

Lets examine how the deployable example in this GitHub repository implements an atomic counter using AWS Lambda , API Gateway , and TiDB.

Api Gateway: Provides HTTP endpoints for interacting with the counter.
Lamba function: Implements the business logic for incrementing the counter.
TiDB: Stores the counters.

In my example project you can decide wheter to use a maximum value for the counter or not: this determine if use or not conditional writes.

Lets focus on lambda business logic, from the tiDBAtomicCounter Lambda code:

connection = await createDbConnection(DB); 
const updateFilter = getUpdateFilter(useConditionalWrites); 
const params = { id: id, max_value: maxCounterValue } 
const [rows] = await connection.query<RowDataPacket[](updateFilter, params);

The getUpdateFilter(useConditionalWrites) method provides a specific filter based on useConditionalWrites boolean variable.

The method is very simple, and i kept it a little verbose than required just for better comprehension.

It basically returns one of two static SQL statements, very similar to each other, with a little but really important difference:

const getUpdateFilter = (useConditionalWrites: boolean): string => { 
const unconditionalWriteParams = 'SELECT counter_value FROM counters WHERE counter_id = :id FOR UPDATE; 
\ INSERT INTO counters (counter_id, counter_value) VALUES (:id, 1) \ ON DUPLICATE KEY UPDATE counter_value = counter_value + 1; \ SELECT counter_value FROM counters WHERE counter_id = :id; \ COMMIT;'; 
const conditionalWriteParams = 'SELECT counter_value FROM counters WHERE counter_id = :id FOR UPDATE; \ 
INSERT INTO counters (counter_id, counter_value) VALUES (:id, 1) \ ON DUPLICATE KEY UPDATE counter_value = IF(counter_value < :max_value, counter_value + 1, counter_value);\ 
SELECT counter_value FROM counters WHERE counter_id = :id; \ COMMIT;'; 
return useConditionalWrites ? conditionalWriteParams : unconditionalWriteParams;}

Lets break down each SQL statements:

First statement

SELECT counter_value FROM counters WHERE counter_id = :id FOR UPDATE;

This statement tells to the db engine that you are selecting the specific table row for update.

Based on the db engine lock mechanism, it locks the row for the duration of the transascion:

Pessimistic locking: immediately when executing the statement, preventing other concurrent transactions from modifying it.
Optmistic locking: lock is not acquired immediately, but the engine would check for conflicts at commit time and retries if conflicts occur.

TiDB default has changed over time, and it is configurable.

I suggest to consider carefully tradeoffs between the two modes: the right one really depends on your specific use case.

Knowledge is free at the library. Just bring your own container.

Second statement

INSERT INTO counters (counter_id, counter_value) VALUES (:id, 1)

Nothing special, just tells the db to insert the new row.

But, wait: we were supposed to talk about incrementing counters, not to inserting new rows!

The third statement is where the magic happens.

Third statement (unconditional writes):

ON DUPLICATE KEY UPDATE counter_value = counter_value + 1;

This statement tells to db engine what to do if the previous statement fails for duplicate key, because we are trying to insert two rows with the same counter_id wich is the tables primary key.

We are basically asking:

please increment by one the counter_value field of the row

With the second and the third statement togheter, we are telling to the db engine:

please insert this new counter, but if the counter is already present, dont panic and just increment it

Third statement, with conditional writes:

ON DUPLICATE KEY UPDATE counter_value = IF(counter_value < :max_value, counter_value + 1, counter_value);

just as the unconditional write version, we are telling the db engine to increment the existing row but only if the current counter value is below the max_value parameter.

please insert this new counter, but if the counter is already present, dont panic and just increment it if it is below the max value.

Fourth statement

SELECT counter_value FROM counters WHERE counter_id = :id

This is just for retrevieng of the value after the insert / update.

Final statement

COMMIT;

This seems like the simplest statement, but this is where all the magic happens: based on your DB Engine configuration, this is where our five-statement transaction is executed atomically on the server, conflicts are solved and then data replicated if it was successful.

Since transactions are executed on the server and are all-or-nothing statements, they fit the atomic counter pattern perfectly.

Trade-Offs and Conclusion

Strengths :

Strong Consistency : TiDBs Raft-based replication and 2PC protocol ensure consistent increments even in globally distributed environments.

SQL Familiarity : Developers can use familiar SQL syntax, reducing the learning curve.

Limitations :

Latency : Cross-region communication for strong consistency may increase latency, especially with pessimistic locking configuration.

Operational Complexity : While TiDB Cloud simplifies management, understanding distributed SQL concepts is necessary for effective use.

Key Takeaway :

TiDB is an excellent choice for globally distributed applications requiring strong consistency. Its support for SQL transactions and automatic scaling makes it a powerful tool for implementing atomic counters in multi-region setups.

Building Atomic Counters with Momento

Nicola Cremaschini — Thu, 02 Jan 2025 16:20:28 +0000

In the world of distributed systems, serverless caching is gaining traction for its simplicity and scalability. Momento, a fully managed serverless cache, builds on the core concepts of caching while eliminating infrastructure management.

In this third installment of the atomic counter series, well explore how to implement the pattern using Momento. By comparing it to Redis, well highlight how Momento simplifies caching for developers, discuss its trade-offs, and guide you through a practical implementation using the code in this GitHub repository.

Serializability, Linearizability, and Momento

Unlike traditional caching systems, Momento operates as a serverless service , meaning you dont manage nodes, replicas, or clusters.

However, like Redis, it provides atomic operations such as increment.

Momentos atomicity ensures that counter updates are serialized within its storage layer. However, consistency across distributed systems can vary based on use cases, which aligns with the eventual consistency model in serverless architectures.

Replication and Leader Election in Momento

As a managed service, Momento abstracts replication and failover.

You dont have visibility into specific replicas or leaders, but the platform ensures high availability by handling replication and redundancy under the hood.

This is a notable difference from Redis, where you control and configure replication explicitly.

Momento offers simplicity at the cost of operational transparency and fine-grained control.

The Atomic Counter pattern

The atomic counter pattern enables precise increment operations, even in distributed environments.

With Momento, you use its increment operation, which automatically initializes the counter if it doesnt exist, similar to Redis.

This approach works very well if you need to increment your counter unconditionally, regardless its current value.

If you need to implement conditional increment, Momento doesnt provide methods to do so on server side: you have to handle it on client side, and this could lead to race conditions.

Hands-on! Walkthrough of the Deployable Example

Lets examine how the deployable example in this GitHub repository

This example demonstrates how to implement an atomic counter using AWS Lambda , API Gateway , and Momento.

API Gateway : Provides HTTP endpoints for interacting with the counter.
Lambda Functions : Implements the business logic for incrementing the counter.
Momento : Stores the counters.

In my example project you can decide wheter to use a maximum value for the counter or not: this determine if use or not conditional writes.

Lets focus on lambda business logic, from the momentoAtomicCounter Lambda code:

const momentoCacheClient = await buildMomentoClient();let counter = 0;if (useConditionalWrites) { counter = await handleConditionalWrites(momentoCacheClient,cacheName, id, maxCounterValue); } else { counter = await handleUnconditionalWrites(momentoCacheClient,cacheName, id); }

As you can see, i wrote two different methods to handle conditional and unconditional writes.

Lets dive into the simpler one, handleUnconditionalWrites:

async function handleUnconditionalWrites(momentoClient: CacheClient,cacheName: string, id: string) { let counter = 0; const cacheIncrementResponse = await momentoClient.increment(cacheName, id, 1); switch (cacheIncrementResponse.type) { case CacheIncrementResponse.Success: counter = cacheIncrementResponse.value(); break; case CacheIncrementResponse.Error: throw new Error(cacheIncrementResponse.message()); } return counter}

The method simply levearage on the increment method from momento sdk.

It increments a key value by an integer (one, in this example) regardless key existence or key current value.

Things get more interesting when it comes to handle conditional writes to increment the counter only if it is below a specified threshold.

Momento does not provide any conditional increment method, but provides few useful conditional writes methods such setIfPresentAndNotEqual and setIfAbsent.

Lets dive into handleContionalWrites implementation (response handling logic is removed better readability):

async function handleConditionalWrites(momentoClient: CacheClient, cacheName: string, id: string, maxCounterValue: string){ let counter = 0; const cacheGetResponse = await momentoClient.get(cacheName, id); switch (cacheGetResponse.type) { case CacheGetResponse.Hit: const currentCounter = Number(cacheGetResponse.value()); const nextCounter = currentCounter + 1; const strNextCounter = nextCounter.toString(); counter = await handleSetIfPresentAndNotEqual(momentoClient,cacheName, id, strNextCounter, maxCounterValue); break; case CacheGetResponse.Miss: counter = await hanldeSetIfAbsent(momentoClient, cacheName, id, '1'); break; case CacheGetResponse.Error: throw new Error(cacheGetResponse.toString()); } return counter}async function handleSetIfPresentAndNotEqual(momentoClient: CacheClient,cacheName: string, id: string, nextCounter: string, maxCounterValue: string) { const cacheSetIfPresentAndNotEqualResponse = await momentoClient.setIfPresentAndNotEqual(cacheName, id, nextCounter, maxCounterValue); ...}async function hanldeSetIfAbsent(momentoClient: CacheClient,cacheName: string, id: string, value: string) { let counter = 0; const setIfAbsentResponse = await momentoClient.setIfAbsent(cacheName, id, value); ...}

These methods perform the following logic:

and this ensures consistency, since race conditions are checked by the two check-and-set methods on server-side.

This is how key initialization works (set if not present branch):

and this is how an existing key increment works (set if present and not equals branch):

Trade-Offs and conclusion

Strenghts:

Serverless Simplicity : No infrastructure to manage, reducing operational overhead.
Built-In Scalability : Automatically scales to meet demand without manual intervention.

Limitations:

Reduced Control : Lack of visibility into replication and cluster configurations.
Eventual Consistency : While atomic operations are supported, consistency guarantees may differ in highly distributed setups.
Performance: since an initial GET is required, more network trips are required compared to other solution that implements conditional increments.

Conclusion

Momento showcases how a serverless-first approach simplifies distributed caching.

By eliminating the need to manage infrastructure, it allows developers to focus on building applications rather than worrying about operational overhead.

For atomic counters, Momentos increment operation makes implementation straightforward and reliable. However, this convenience comes with trade-offs: you lose the granular control over replication and failover configurations that traditional systems like Redis offer.

If youre exploring distributed counters for your application, I highly recommend trying out the example provided in the GitHub repository.

Stay tuned for the next installment in the series, where well delve into DocumentDB

Building Atomic Counters with Elasticache Redis

Nicola Cremaschini — Sun, 22 Dec 2024 16:51:45 +0000

When working with high-throughput, low-latency applications, Redis an in-memory data storestands out as an excellent choice for implementing the atomic counter pattern.

With its atomic operations and simple APIs, Redis offers a straightforward approach to incrementing counters while ensuring high performance.

In this article, well explore how to build an atomic counter using AWS ElastiCache Redis.

Youll gain a practical understanding of Redis concepts like atomic operations , its replication model , and how to implement counters with the code from this GitHub repository.

Serializability, Linearizability, and Redis

Before we can dive deep into code, we need to recall few concepts (please refer to the first article of this series for a detailed explanation ):

Serializability : Operations appear in a consistent sequential order, ensuring correctness.
Linearizability : Writes are immediately visible for subsequent reads, ensuring real-time consistency.

Redis processes commands in a single-threaded event loop , ensuring that each command is executed in the order its received. This guarantees atomicity at the command level for operations like INCR.

While Redis operations on a single node can be considered linearizable , in a distributed Redis setup (e.g., with clustering or replicas), this strict ordering can break. Writes to replicas are propagated asynchronously, so they may lag behind the primary node.

Replication and Leader election in Redis

Redis employs a primary-replica architecture , where:

The primary node handles all writes and propagates updates to replicas asynchronously.
Redis Sentinel handles failover, promoting a replica to primary in case of failure.

For atomic counters, a single primary node is typically sufficient. If clustering is used, counter keys should be kept on a single shard to maintain atomicity.

The Atomic Counter Pattern

The atomic counter pattern allows you to increment a value reliably, even in distributed systems, by ensuring operations are conflict-free and consistent.

Redis supports this pattern natively through the INCR command, which atomically increments a keys value by 1.

However, if it is necessary to carry out the increment depending on the current status of the counter, race conditions may be possible.

Hands-on! Walkthrough of the Deployable Example

Lets dive into the example provided in the GitHub repository.

This example demonstrates how to implement an atomic counter using AWS Lambda , API Gateway , and ElastiCache Redis.

API Gateway : Provides HTTP endpoints for interacting with the counter.
Lambda Functions : Implements the business logic for incrementing the counter.
ElastiCache Redis Cluster: Stores the counters with atomicity guarantees.

In my example project you can decide wheter to use a maximum value for the counter or not: this determine if use or not conditional writes.

Lets focus on lambda business logic, from the redisAtomicCounter Lambda code:

const redisClient = await buildRedisClient();
const result = await redisClient.eval(getLuaScript(useConditionalWrites), 1, id, maxCounterValue);

This code snippet simply sends an eval command and gets the new updated counter value, using the ioredis client.

Let see how the getLuaScript() works:

const getLuaScript = (useConditionalWrites: boolean) => { 
const unconditionalIncrementScript = ` redis.call('INCR', KEYS[1]) local counter = redis.call('GET', KEYS[1]) return counter `; 
const conditionalIncrementScript = ` 
local counter = redis.call('GET', KEYS[1]) 
local maxValue = tonumber(ARGV[1]) 
if not counter 
then counter = 0 
end 
counter = tonumber(counter) 
if counter < maxValue then 
redis.call('INCR', KEYS[1]) 
counter = redis.call('GET', KEYS[1]) 
return counter 
else 
return 'Counter has reached its maximum value of: ' .. maxValue 
end `; 
return useConditionalWrites ? conditionalIncrementScript : unconditionalIncrementScript;}

Unconditional writes don't require actually to be executed inside a LUA Script: we could use the incr() method provided by ioredis client.

But for conditional writes, it is required to check the counter value before incrementing it to avoid race conditions: if we perform the check on client side, another client might increment the counter between the get() and the incr() instructions execution on the first client.

Let's see an example: assuming the maximum value for the counter is 10, Alice and Bob perform a GET for the same key 1, when the counter value is 9.

They check if the current value is below the maximum value, and then they both send an INC command to redis.

Since INC command is executed unconditionally on server side, the counter is incremented by two and exceeds the maximum value.

Redis secret sauce: LUA Script

The solution is to check the counter value on server side, executing a LUA Script.

It gets the counter value, check if it is below the maximum value and eventually increment it:

local counter = redis.call('GET', KEYS[1])
local maxValue = tonumber(ARGV[1])
if not counter 
then counter = 0
endcounter = tonumber(counter)
if counter < maxValue 
then 
redis.call('INCR', KEYS[1]) 
counter = redis.call('GET', KEYS[1]) 
return counter
else 
return 'Counter has reached its maximum value of: ' .. maxValue
end

If you look the code, you might think this code is not safe too: another script could increment the counter between the GET and the INCR command execution.

The magic of LUA Script is that just one script can be executed at the same time, as reported in the documentation:

Redis guarantees the script's atomic execution. While executing the script, all server activities are blocked during its entire runtime. These semantics mean that all of the script's effects either have yet to happen or had already happened.

and that is exactly what we need when dealing with atomic counters: Since only one script is executed, there is no concurrency and no race conditions, as serializability is guaranteed.

Moreover, we have better performance, which is good:

Because scripts execute in the server, reading and writing data from scripts is very efficient.

Trade-Offs and Conclusion

Strengths :

Redis provides low-latency atomic operations.
The INCR command is inherently atomic, simplifying counter implementation.
LUA Script execution prevent race conditions on conditional writes, achieving serializability.

Limitations :

Asynchronous Replication : Updates may not immediately reflect on replicas.
Durability Risks : Without persistence, counters may reset after a failure or restart.

Conclusion:

Redis is ideal for high-performance, in-memory atomic counters where latency is a top priority, and simplifies building atomic counters with its native support for atomic operations and low-latency access.

However, consider its replication and durability trade-offs for production use.

Check out the deployable example here, and stay tuned for the next article in the series, where well explore Momento as an alternative for serverless caching.

Building Atomic Counters with DynamoDB

Nicola Cremaschini — Mon, 09 Dec 2024 06:00:50 +0000

DynamoDB, a serverless NoSQL database, is a go-to choice for implementing atomic counters due to its built-in support for atomic operations and managed scalability. This article will guide you through how DynamoDB ensures consistency and replication, a refresher on the atomic counter pattern, and a hands-on walkthrough of a deployable example from this GitHub repository.

By the end, youll understand how to leverage DynamoDB for atomic counters and know the trade-offs involved.

Serializability and Linearizability in DynamoDB

Before we can dive deep into code, we need to recall few concepts (please refer to the first article of this series for a detailed explanation)

Serializability : Operations appear in a consistent sequential order, ensuring correctness.
Linearizability : Writes are immediately visible for subsequent reads, ensuring real-time consistency.

DynamoDB achieves linearizable writes through its single-leader replication model :

Write operations are directed to the leader node for the partition key, and changes are propagated to replicas.
Strongly consistent reads (optional) ensure the latest value is returned immediately after a write.

This guarantees DynamoDB can safely implement atomic operations like counters.

Replication and Leader election in DynamoDB

DynamoDB automatically manages replication across multiple availability zones to ensure durability and availability. Key mechanisms include:

Single-leader replication : A leader node handles writes, maintaining consistency while replicas handle reads.

Leader election : If a leader fails, DynamoDB promotes another replica seamlessly, ensuring high availability without manual intervention.

This replication strategy enables DynamoDB to handle distributed workloads while maintaining data consistency.

A Note on Synchronized Timestamps in Distributed Databases

Synchronized timestamps play a critical role in distributed databases, especially for ensuring consistency across geographically dispersed replicas.

Without synchronized clocks, it becomes challenging to determine the order of operations accurately, leading to potential consistency issues in global-scale applications.

In the AWS ecosystem, the AWS Time Sync Service provides a highly accurate and reliable time source synchronized across all AWS Regions.

Announced last year, this service offers nanosecond-level precision and a consistent view of time, serving as a foundational piece for distributed systems.

Recently, AWS built upon this foundation to announce strong consistency for DynamoDB global tables. This new feature allows applications to perform strongly consistent reads and writes across multiple regions, ensuring the same data is visible no matter where the query originates.

Why is this important? Strong consistency in global tables depends on synchronized timestamps to ensure that write propagation across regions respects causal ordering. This prevents race conditions and ensures data correctness even in high-latency or failure scenarios.

Impact on Atomic Counters : If your atomic counter spans multiple regions via global tables, synchronized timestamps enable accurate propagation of updates, preserving the order and integrity of increments.

This synergy of the AWS Time Sync Service and DynamoDB advancements showcases how synchronized time is more than just an infrastructure detailits a cornerstone of achieving robust distributed consistency.

DynamoDB Conditional Writes

DynamoDBs conditional write feature allows you to execute write operations (PutItem, UpdateItem, DeleteItem) only if specific conditions are met.

This capability is crucial for enforcing business rules, ensuring data integrity, and preventing race conditions in distributed systems.

When you perform a conditional write, you include a ConditionExpression in the request.

DynamoDB evaluates this condition against the items existing attributes before executing the operation:

If the condition evaluates to true , the write operation proceeds.
If the condition evaluates to false , the operation fails with a ConditionalCheckFailedException.

Few use cases for conditional writes includes:

Enforcing constraints

Ensure unique records in a table by verifying an attribute does not exist:

ConditionExpression: "attribute_not_exists(partitionKey)"

Prevent counter increments beyond a maximum value:

ConditionExpression: "counterValue < :maxValue"

Concurrent Updates without conflicts

Safely update an item only if its version matches a known value (optimistic locking):

ConditionExpression: "version = :expectedVersion"

Transactional Integrity

enforce rules like only update if another attribute matches a specific state.

Key Benefits:

Atomicity:
Data Integrity:
Performance:

The atomic counter pattern

The atomic counter pattern ensures safe, concurrent updates to a counter without losing increments due to race conditions. In distributed systems:

Operations must be atomic (all or nothing).

DynamoDB achieves this with the UpdateItem operation and the ADD attribute update expression, which ensures the counter is incremented atomically.

Hands-on! Walkthrough of the Deployable Example

Lets explore the inner workings of the example in the GitHub repository, focusing on how DynamoDB is used for atomic counters.

The implementation includes:

API Gateway : Provides HTTP endpoints for interacting with the counter.
Lambda Functions : Implements the business logic for incrementing the counter and enforcing optional constraints like a maximum value.
DynamoDB Table : Stores the counters with atomicity guarantees.

In my example project you can decide wheter to use a maximum value for the counter or not: this determine if use ore not conditional writes.

Lets focus on this logic,from the dynamoDbAtomicCounter Lambda code:

const id = event.pathParameters?.id;
const writeParams = getWriteParams(useConditionalWrites, id, maxCounterValue);
const dynamoDBClient = await buildDynamoDbClient();
const result = await dynamoDBClient.send(new UpdateItemCommand(writeParams));
const counter = Number(result.Attributes?.atomic_counter.N);

This code snippet simply sends an UpdateItemCommand and gets the new updated counter value, using the AWS SDK DynamoDBClient.

Let see how the getWriteParams works:

const getWriteParams = (useConditionalWrites: boolean, id: string, maxCounterValue: string) => { 
const TABLE_NAME = process.env.TABLE_NAME || ''; 
const unconditionalWriteParams = { 
TableName: TABLE_NAME, Key: { id: { S: id }, }, 
UpdateExpression: 'ADD atomic_counter :inc', ExpressionAttributeValues: { ':inc': { N: '1' } }, 
ReturnValues: 'UPDATED_NEW' as const, }; 
const conditionalWriteParams = { 
TableName: TABLE_NAME, Key: { id: { S: id }, }, 
UpdateExpression: 'ADD atomic_counter :inc', ConditionExpression: 'attribute_not_exists(atomic_counter) or atomic_counter < :max', ExpressionAttributeValues: {':inc': { N: '1' }, ':max': { N: maxCounterValue }, }, ReturnValues: 'UPDATED_NEW' as const, }; return useConditionalWrites ? conditionalWriteParams : unconditionalWriteParams;}

This method checks if conditional writes are required: the update expression is the same in both cases, and it leverages the ADD command.

UpdateExpression: 'ADD atomic_counter :inc'

If conditional writes are required, a Condition Expression is used:

ConditionExpression: 'attribute_not_exists(atomic_counter) or atomic_counter < :max'

Trade-Offs and Conclusion

Advantages

DynamoDBs managed infrastructure handles replication and scaling.
Simple and efficient atomic updates using UpdateItem.

Limitations

Hot partitions : High traffic to a single counter may cause throttling.
Throughput limits : Monitor RCUs/WCUs to avoid performance degradation.
Eventual consistency : Use strongly consistent reads when precise counter values are critical.

Conclusion

DynamoDB provides a simple and scalable solution for implementing atomic counters in distributed systems.

Its built-in atomicity and managed replication make it a strong candidate for this pattern.

Explore the GitHub repository to deploy the example and experiment with atomic counters in DynamoDB, and stay tuned for the next article, where well explore the ElastiCache Redis implementation of the atomic counter pattern.

Atomic counter: framing the Problem Space

Nicola Cremaschini — Fri, 22 Nov 2024 16:23:29 +0000

Why Atomic Counters Matter in Distributed Systems

In distributed systems, ensuring accuracy and consistency in concurrent operations is a core challenge. Atomic countersa mechanism for maintaining precise, incrementing countsare a common requirement in applications like:

Rate Limiting : Tracking API usage to enforce quotas.
Inventory Management : Keeping stock levels accurate in real time.
Leaderboards : Recording scores and ranks in games or applications.
Analytics : Counting events such as clicks or views for reporting.

The Challenge: Scaling Atomicity in Distributed Systems

When multiple processes update a shared counter, ensuring accuracy without conflicts is difficult. Challenges include:

Race Conditions : Concurrent updates may result in incorrect counts.
Data Integrity : Systems must ensure updates are not lost, even in failure scenarios.
Scalability vs. Consistency : Distributed systems trade off latency, fault tolerance, and strict consistency.

This trade-off is encapsulated in the CAP theorem , which states that a distributed database can only guarantee two of the following three properties:

Consistency : Every read reflects the most recent write.
Availability : Every request receives a response, even if some nodes are down.
Partition Tolerance : The system operates even when network partitions occur.

Atomic counters live at the intersection of these challenges. For example:

Choosing consistency and partition tolerance ensures correctness but may sacrifice availability during failures.
Prioritizing availability and partition tolerance may allow stale or conflicting updates.

Serializability and Linearizability

Atomic counters require precise semantics to maintain correctness:

Serializability ensures that concurrent operations are executed in a sequence that could occur in a single-threaded system. Its the gold standard for consistency in databases but can be computationally expensive.
Linearizability , a stronger guarantee, ensures that operations appear instantaneous and reflect the latest state globally. This is crucial for atomic counters where every increment must reflect an up-to-date value.

Why These Databases?

For this series, Ive chosen DynamoDB , DocumentDB , Elasticache Redis , GoMomento, and TiDB for several key reasons:

Serverless and SaaS Models : DynamoDB, DocumentDb and the SaaS version of TiDB handle infrastructure and scaling for you. Similarly, ElastiCache and Momento offer managed caching solutions, focusing on simplicity and performance.
Diverse Strategies : These systems represent a variety of approaches to critical aspects of distributed systems:
Specialized Solutions : By comparing these systems, well uncover insights into how different architectures tackle the shared challenge of atomicity, equipping you to make informed choices in your projects.

Why a Pattern Matters

The atomic counter pattern provides structured solutions to navigate these complexities, leveraging the unique strengths of various databases and caching systems. By using native features such as conditional writes, Lua scripts, or distributed transactions, developers can:

Ensure correctness under concurrent updates.
Balance consistency, availability, and scalability based on system needs.
Simplify implementation by relying on proven database capabilities.

In this series, well explore how to:

Understand the trade-offs of implementing atomic counters in distributed environments.
Build practical solutions using Node.js and AWS CDK , supported by real-world examples.
Apply atomic counter patterns across databases like DynamoDB, Redis, TiDB , and SaaS services like Momento.

Lets set the stage for building reliable atomic counters with a strong foundation in distributed systems theory and practical implementations.

Here's the github repository with deployable stack to explore the different implementations

Evaluating Performance: A Benchmark Study of Serverless Solutions for Message Delivery to Containers on AWS Cloud - Episode 2

Nicola Cremaschini — Fri, 10 May 2024 13:25:30 +0000

This post follows my previous post on this topic, and it measures the performance of another solution for the same problem, how to forward events to private containers using serverless services and fan-out patterns.

Context

Suppose you have a cluster of containers and you need to notify them when a database record is inserted or changed, and these changes apply to the internal state of the application. A fairly common use case.

Let's say you have the following requirements:

The tasks are in an autoscaling group, so their number may change over time.
A task is only healthy if it can be updated when the status changes. In other words, all tasks must have the same status. Containers that do not change their status must be marked as unhealthy and replaced.
When a new task is started, it must be in the last known status.
Status changes must be in near real- time. Status changes in the database must be passed on to the containers in less than 2 seconds.

Solutions

In the first post about this i explored two options and measured performance of this one:

The AppSync API receives mutations and stores derived data in the DynamoDB table
The DynamoDB streams the events
The Lambda function is triggered by the DynamoDB stream
The Lambda function sends the events to the SNS topic
The SNS topic sends the events to the SQS queues
The Fargate service reads the events from the SQS queues
If events are not processed within a timeout, they are moved to the DLQ
A Cloudwatch alarm is triggered if the DLQ is not empty

The even more serverless version

An even more serverless version of the above solution replaces Lambda and SNS with Eventbridge

The AppSync API receives mutations and stores derived data in the DynamoDB table
The DynamoDB stream the events
EventBridge is used to filter, transform and...
...fan-outs events to SQS queues
The Fargate service reads the events from the SQS queues
If events are not processed within a timeout, they are moved to the DLQ
A Cloudwatch alarm is triggered if the DLQ is not empty

The only code i wrote here is the code to consume SQS from my application, no glue-code is required.

Trust, but verify

I've conducted a benchmark to verify the performance of this configuration, in terms of latency from the mutation being posted to Appsync to the message received by the client polling SQS.

Key system parameters

Region: eu-south-1
Number of tasks: 20
Event bus: 1 SQS per task, 1 DLQ per SQS, all SQS subscribed to one SNS
SQS Consumer: provided by AWS SDK, configured for long polling (20s)
Task configuration: 256 CPU, 512 Memory, Docker image based on Official Node Image 20-slim
DynamoDB Configured in PayPerUseMode, stream enabled
EventBridge configured to intercept and forwards all events from Dynamo stream to SQS queues

Benchmark parameters

I used a basic postman collection runner to perform a mutation to Appsync every 5 seconds, for 720 iterations.

Goal

The goal was to verify if containers would be updated within 2 seconds, and to verify performance against the first version.

Measurements

i used the following Cloudwatch provided metrics:

Appsync latency
Dynamo stream latency
EventBridge Pipe duration
EventBridge Rules latency

The SQS time taken custom metric is calculated from SQS provided attributes.

Results

Disclaimer: some latency measurements are calculated on consumers' side, and we all know that synchronizing clocks in a distributed system is a hard problem.

Still, measurements are performed by the same computing nodes.

Please consider following latencies not as precise measurements but as coarse indicators.

Here screenshots from my Cloudwatch dashboard

Few key data, from Average numbers:

Most of the time is taken by EventBridge rule, I couldn't do anything to lower this latency. The rule is as simple as possible and it is integrated natively by AWS.
The average total time taken is 210.74 ms , versus 108.39 ms taken by the first version with Lambda and SNS.
The average response time measured by my client, which covers my client's network latency, is 175 ms. Given Appsync AVG Latency is 62.7 ms, my Avg network latency is 112,13 ms. This means that from my client sending the mutation to consumers receiving the message there are 175 + 113.13 = 288.13 ms

Conclusion

This solution has proven to be fast and reliable and requires little configuration to set up and no glue-code to write.

Since everything is managed, there is no space for tuning and improvements.

The latency of this solution is worse than the first version by 194.44%.

However, EventBridge offers many more capabilities than SNS.

Wrap up

In this article, I have presented you with a solution that I had to design as part of my work and my approach to solution development: this includes clarifying the scope and context, evaluating different options, and having a good knowledge of the parts involved and the performance and quality attributes of the overall system, writing code and benchmarking where necessary, but always with the clear awareness that there are no perfect solutions.

I hope it was helpful to you, and here is the GitHub repo to deploy both versions of the solution.

Bye 👋!

Evaluating Performance: A Benchmark Study of Serverless Solutions for Message Delivery to Containers on AWS Cloud

Nicola Cremaschini — Sun, 03 Mar 2024 21:48:21 +0000

In this article i'll show you how to forward events to private containers using serverless services and fan-out pattern.

I'll explore possible solutions within AWS ecosystem, but all are applicable regardless the actual service / implementation.

Context

Let's say you have the following requirements:

The tasks are in an autoscaling group, so their number may change over time.
A task is only healthy if it can be updated when the status changes. In other words, all tasks must have the same status. Containers that do not change their status must be marked as unhealthy and replaced.
When a new task is started, it must be in the last known status.
Status changes must be in near real- time. Status changes in the database must be passed on to the containers in less than 2 seconds.

Given these requirements, let's explore a few options.

Option 1: tasks directly querying the database

Pros:

easy to implement: The task is just to perform a simple query and get the current status, assuming it can be queried.
fast: It really depends on the DB resources and the complexity of the query, but there are not many hops and can be configured to be fast. You can configure polling time to match our requirement of 2 seconds requirement, e.g. every 1 second.
easy to mark as unhealthy tasks that fails to perform queries. The application could catch errors in queries and mark itself as unhealthy if it has enough resources. Otherwise, the load balancer's health check would fail.

Cons:

waste of resources: Your application queries the database even if no changes have been made. If your database does not change more frequently than the polling rate, most queries are useless.
your database is a single point of failure: If the database cannot serve queries, tasks cannot be notified.
it does not scale well: As the number of tasks grows, the number of queries grows and you may need to scale the database as well, or you may need a very large cluster running all the time to accommodate any scaling, wasting resources.
difficult to monitor: How can you check if an individual task is in the right state?

In such a scenario, I definitely don't like polling.

Let's try a different and opposite approach.

Option 2: Db streams changes to containers

Instead of having tasks asking to the database, let's have the database notifying them for changes.

Before go into the pros and cons, i must say that it would be very hard if not impossible to implement this solution exactly as i drown it. We can use a very popular pattern, called fan-out.

This is the wikipedia definition:

In message-oriented middleware solutions, fan-out is a messaging pattern used to model an information exchange that implies the delivery (or spreading) of a message to one or multiple destinations possibly in parallel, and not halting the process that executes the messaging to wait for any response to that message

To make things a little more concrete, let's use some popular AWS services that are commonly used to implement this pattern:

DynamoDB: NoSql database with native event streaming
SNS: pub/sub event bus
SQS: queue service

The solution looks like this:

Now let's explore pros and cons:

Pros:

first of all, you can see that arrows turned into dotted lines. This architecture is completely asynchronous
easy to implement: all integrations you need are native. You need just to configure serverless services and to implement a SQS consumer in your application.
very scalable: you can add as many task as you want without affecting the database, your limit here is SNS but is very high. As stated in official docs a single topic supports up to 12,500,000 subscriptions.
no waste of resources: a.k.a really cost-effective. This solution leverages on pay-per-use services, and they would be used only when actual changes occurs on db.
very easy to monitor: both SNS and SQS supports Dead Letter Topic / Queue: if a message isn't consumed within the timeout, it can be moved into a DLQ. You can set up an alarm if a DLQ is not empty, and kill the associated task.
easy to recover: If a container cannot consume a message, it can try again. In other words, it does not have to be online and ready to receive the message at the moment it is delivered, as the queues are persistent.
very fast: i did a benchmark on this solution, here the github repo with the actual code. Later on in this article we'll see results

Cons

more moving parts: even if the integration code is not required since it's provided by AWS, connecting things and tuning connections is not straightforward as performing a query.
not so easy to troubleshoot. As every distributed system, i would say.
it strongly depends on serverless services: if one link in the chain slows down or are not available, your containers can't be notified. We have to say that all involved services have a very good SLA: 3 nines for SQS and SNS and 4 nines for DynamoDB. Not sure about Dynamo stream, since it appears to be not included in DynamoDB SLA. I suppose dynamo streams are backed by Kinesis Streams, which also has 3 nines of availability.

Open points:

The main open point here, to me, was: is this fast enough? Let's verify it.

Trust, but verify

I couldn't find any official SLA about latency for involved services nor any AWS official benchmark.

So i decided to perform one myself, and i scripted a basic application using typescript and CDK / SDK.

Here the github repo with the actual code and details on how the system is implemented.

Before going ahead, bare in mind that i performed this benchmark with the goal to understand if this combination of services / configuration could fit for my specific context / use case. Your context may be different, and this configuration may not fit with it.

System design and data flow

The AppSync API receives mutations and stores derived data in the DynamoDB table
The DynamoDB stream the events
The Lambda function is triggered by the DynamoDB stream
The Lambda function sends the events to the SNS topic
The SNS topic sends the events to the SQS queues
The Fargate service reads the events from the SQS queues
If events are not processed within a timeout, they are moved to the DLQ
A Cloudwatch alarm is triggered if the DLQ is not empty

Key system parameters:

Region: eu-south-1
Number of tasks: 20
Event bus: 1 SQS per task, 1 DLQ per SQS, all SQS subscribed to one SNS
SQS Consumer: provided by AWS SDK, configured for long polling (20s)
Task configuration: 256 CPU, 512 Memory, Docker image based on Official Node Image 20-slim
DynamoDB Configured in PayPerUseMode, stream enabled to trigger Lambda
Lambda stream handler written in node20 bundled with ESBuild, configured with 128MB

Benchmark parameters

I used a basic postman collection runner to perform a mutation to Appsync every 5 seconds, for 720 iterations.

Goal

The goal was to verify if containers would be updated within 2 seconds.

Measurements

I used the following Cloudwatch provided metrics:

Appsync latency
Lambda latency
Dynamo stream latency

and I created two custom metrics for measuring SQS and SNS time taken.

Time-taken custom metrics are calculated from the SNS and SQS-provided attributes:

SNS Timestamp: from AWS doc

The time (GMT) when the notification was published.

ApproximateFirstReceiveTimestamp: from AWS doc

returns the time the message was first received from the queue (epoch time in milliseconds).

SentTimestamp: from AWS doc

Returns the time the message was sent to the queue (epoch time in milliseconds).

The following code snippet shows you how attributes are used to calculate sns time taken in millis and sqs time taken in millis

//despite the name, this is the ISO Date the message was sent to the SNS topiclet sns
ReceivedISODate = messageBody.Timestamp;
if (snsReceivedISODate && message.Attributes) {
clientReceivedTimestamp = message.Attributes.ApproximateFirstReceiveTimestamp!; sqsReceivedTimestamp = message.Attributes.SentTimestamp!; 
let snsReceivedDate = new Date(snsReceivedISODate);
snsReceivedTimestamp = snsReceivedDate.getTime(); 
clientReceivedDate = new Date(clientReceivedTimestamp!); 
sqsReceivedDate = new Date(sqsReceivedTimestamp!); 
snsTimeTakenInMillis = sqsReceivedTimestamp - snsReceivedTimestamp; 
sqsTimeTakenInMillis = clientReceivedTimestamp -sqsReceivedTimestamp;

i didn't calculate the time taken by the client to parse the message because it really depends on the logic the client applies to parsing the message.

Results

Disclaimer: some latency measurements are calculated on consumers' side, and we all know that synchronizing clocks in a distributed system is a hard problem.

Still, measurements are performed by the same computing nodes.

Please consider following latencies not as precise measurements but as coarse indicators.

Here screenshots from my Cloudwatch dashboard

Few key data, from Average numbers:

Most of time is taken by Appsync, i couldn't do anything to lower this latency since i used native Appsync native integration with DynamoDB.
The only custom code is the Lambda stream processor code, and lamba duration is the second slowest component here. As you can see in the graph, the lambda cold start is the killer, but considering this we can observe a very good latency on avg (38 ms).
The average total time taken is 108.39 ms
The average response time measured by my client, that cover my client network latency, is 92 ms. Given Appsync AVG Latency is 60.5 ms, my Avg network latency is 29.5 ms. This means that from my client sending the mutation to consumers receiving the message there are 108.39 + 29.5 = 137.89 ms

Conclusion

This solution has proven to be fast and reliable and requires little configuration to set up.

Since almost everything is managed, there is little space for tuning and improvements. In this particular configuration, I could simply give the Stream Processor Lambda more memory, but memory and latency do not scale (inversely) together.

~~I could remove Lambda and replace it with Event Bridge Pipe. I haven't tried it yet, but i'm going to use the exact same benchmark and compare the results.~~

UPDATE: here the benchmark of the aforementioned solution with EventBridge

Last but not least, keep in mind that AWS does not always include latency in the service SLA. I've run this benchmark a few times with comparable results, but I can't be sure that I will always get the same results over time. If your system requires stable and predictable performance over time, you can't go with services that don't include performance metrics in their SLA. You're better off taking control of the layers below, which means you should consider going to a restaurant or even making your own pizza at home.

Wrap up

In this article, I have presented you with a solution that I had to design as part of my work and my approach to solution development: this includes clarifying the scope and context, evaluating different options and having a good knowledge of the parts involved and the performance and quality attributes of the overall system, writing code and benchmarking where necessary, but always with the clear awareness that there are no perfect solutions.

I hope it was helpful to you, and here is the GitHub repo to deploy both versions of the solution.

Bye 👋!

Serverless social login with AWS Cognito

Nicola Cremaschini — Sun, 24 Dec 2023 16:30:24 +0000

Disclaimer: This is not a step-by-step guide, just my trade-off analysis on using Amazon Cognito to provide social login for your app and some pitfalls I found in my experience.

In this article, I'll show you my serverless solution to add social identity providers as a login option for web and mobile applications, based on managed services and native integrations, and how I mitigated some issues I encountered.

Context

Let's assume you have an application for which your users do not have to register, but can log in with their social identity.

If you 're wondering why, think about it:

registration could be a barrier to entry for users as it requires more steps and sharing of data
most internet users have at least one social identity. All mobile users have at least one (Google identity for Android users, Apple identity for Apple users)
it is very easy for users to access your app if most of the login is done without a password
you can receive users data from social providers, if users allow your app to give their data to your app.

The most popular social IdPs are Facebook, Google, Apple, Amazon, LinkedIn, Github and many others.

Considering that every IdP should implement the OpenID Connect standard (we'll come back to this later...), which is a layer above the OAuth2 standard, and that every IdP requires some configuration, let's explore some options.

Option 1: Native integration

Every one has its own SDK and apis to integrate natively, so you can code in your app the integration for IdPs you want to use.

Pros

fine-grained control over each individual IdP integration. Since each IdP is natively integrated, you can customise the specific UX via configuration and handle IdP requests that are not included in the OAuth standard (we'll get to that later...)
direct integration, no intermediary, straightforward architecture. You can rely on robust implementations (Google / Facebook / Amazon provides good code in their SDK) and IdP resilience and H-A.
Cost-effective: usually IdPs provide a free-tier for their api, so there aren't any costs from that side.

Cons

Difficult to scale: Each IdP has its own SDK and its own customs (someone said "standard?"), and it takes a lot of code is required to handle them. Even if you put your authentication logic into a library, you have to distribute it to all clients to get a change.
Hard to test / troubleshoot: more code, more tests. Moreover, different integrations require you to know IdP customs.

Option 2: use an OAuth Provider

Since Social IdP adhere to a standard, it's easy to abstract the specific implementation (SDK) and work with interfaces, integrating with an OAuth 2 service provider.

Pros

Just one integration, between your client and di OAuth identity platform. Less code, less test, less releases, more speed.
Easy to scale: you can add/remove IdP without impacting clients (see previous bullet)
Authentication flow configuration and governance is now centralised. You can create consistent auth flow regardless the specific IdP you support, and you can monitor it and gather metrics and statistics in one place.
You build your auth flow on standards.
There are Identity Platform as a service out there (AWS Cognito, Auth0, Google Firebase and many others)

Cons

Your integration choices are limited to IdPs supported by your OAuth provider.
Your system complexity is higher, since you add components to it.
The OAuth provider could be a single point of failure. If it is not available, you cannot offer authentication to your customers. Therefore, you need to think carefully about the reliability and scaling of your OAuth provider.

My choice: Option 2 with AWS Cognito

I'm aware that you may have many constraints and to be brief, I cannot list them all: given my context, I went with option 2 and used AWS Cognito as OAuth provider. I did a spike on Auth0 and some other services.

I decided to accept the constraints and costs of Cognito in exchange for a low-code implementation and easy setup, in other words for faster delivery, because I wasn't sure if it would be worth it.

Here my actual implementation

All you need is to:

configure your integration on Social Provider side. Here a reference for each of provider i integrated with
configure Cognito integration. Here AWS Doc for each supported providers
Integrate your application with Amazon Cognito. Cognito provides an hosted ui for the login page, but you can create your own.

Pitfalls: things to be careful about

Here I list some of the pitfalls I have encountered in this integration. This is not an exhaustive list of what goes wrong with Amazon Cognito and the social login flow, but again my personal experience or in other words things I found during my working days.

Watch out for Cognito limits

Serverless does not mean infinite, and Cognito is one of the services that best demonstrates this.

In one sentence: Cognito's scaling policy is not designed for spiky patterns.

The scaling pattern is (reasonably) tied to the size of your user pool: the more users, the more TPS provided.

But, and here comes the first pitfall, the first threshold is up to 1 million users. From 1 to 999999 users, you have the same TPS.

This means that if your login pattern is fairly consistent, you probably won't have any problems. However, if your login pattern is spiky, perhaps because your app is tied to certain time periods in some way, your app will struggle with a lot of throttling errors from Cognito.

These diagram show successful federation logins and throttling errors:

i split into two distinct diagrams for better visualisation, but i want to point out that

around 20:50 i had ~7K throttling errors and ~1.5K of success (total requests: ~8.5K)
around 21:20 i had ~6K throttling errors and ~1.4K success (total requests: ~7.5K)
around 22:30 i had ~1.3K success with ZERO throttling errors

Cognito TPS calculation rules can be found at this specific section of Cognito docs, and you have to carefully consider them.

As you can see from the successfully login metric diagram, handling the throttling exception in your app can mitigate the user impact: they would be able to successfully login anyway, but waiting a little bit more.

I decided that it could be acceptable, and i traded it for easy setup and integration with Social Providers.

Since this decision would impact our customer experience i tried to mitigate it as much as possible, for instance sending push notifications before traffic spikes to encourage users to log in and spread log in requests.

Standards are not prescriptive

I love standards, everybody should love them in engineering.

Unfortunately, sometimes for good reasons and sometimes not, giants have bias to force standards a little bit.

Apple, i'm pointing my finger at you!

First, Apple's guidelines require you to log in to Apple if you want to distribute your app in the Apple Store and your app has a social login feature. That may be a bit rude, but it's fair.

Again,Apple prescribe you that the "User cancellation" function must be accessible and clear. That is also fair.

And here Apple does not adhere to the OAuth standard: if an Apple user allows Apple to share their data with your app, some kind of association between your app and the user also takes place in the Apple system, and if a user wants to cancel from your app (also known as your user pool), this association should also be removed.

To do that, you have to invoke Apple apis to:

generate a valid access or refresh token.
invalidate the freshly generated token.

Sounds weird, but this is exactly what this doc page prescribes.

And, guess what? Cognito doesn't handle it.

Even if Cognito could handle it because it has all the information it needs, especially the private key you created on the Apple side and provided to Cognito to request the tokens, that's reasonable from a product perspective: Cognito adheres to standards and can't track every specific implementation.

But it does mean that Apple won't include your app in the store if you don't take care of it.

So let's take a look at how to implement it.

You can't implement it in the app: i used Cognito to decouple the app from auth providers, and i don't want to violate that requirement. Besides, you don't want to store your private key on the device, do you?

So you need to implement it on the backend side. My first idea was to use events to respond to the Cognito event to delete a user and trigger a lambda that calls the Apple api to delete the user on the Apple side.

As far as I know, Cognito today has

Lambda triggers: user deletion not supported
Cloudtrail tracks all management api calls, and user cancellation is a management api. But Cloudtrail event doesn't have any reference to actual user (and it saved my day in an audit session, but this is another story)
Cognito Sync: it seems to handle user deletion. Quoting:

This is how it looks like:

I see two problems here:

first, you have to put your Apple's private key in Cognito and in Secret Manager. Cognito can't retrieve it from Secret Manager. I raised this issue to Cognito team, keep you posted on this.
second, Cognito user cancellation and Apple user cancellation are asynchronous: what if it success on Cognito side and than fails on Apple side? User wont be in our Cognito user pool anymore, so we can't rollback the operation. So you need to handle failures, and to handle it you need to store it. Let's add a DLQ for our deletion lambda

After saving, you must analyse why the deletion failed and try again. How long can this take? It depends on the cause and your process, but until you've done that, users will still see their user associated with your app, and I'm not sure Apple would like it and approve your app submission.

You need to reverse the order of deletion, first on the Apple side and then on the Cognito side. If the Apple deletion fails, you can send an error message to the user and inform he/she that the deletion cannot be performed and they should try again later.

In the case of a Cognito error, you will have to do this later, but at least the user will not see that their user is linked to your app and Apple should be satisfied and approve your request.

Let's see how it looks like

I still see two problems here:

Again , you have to put your Apple's private key in Cognito and in Secret Manager.
Your app now is integrated with two systems: Cognito for Sign-in operation and your custom api for user deletion

Both solutions somehow solve the problem and both raise new concerns, so I had to opt for the less bad one.

I decided to implement a custom api for Apple user deletion because it can be implemented just in half our code base (not for Android version of the app), the integration is quite simple and Apple would be happy with this solution, but probably not with the alternative solution. Still an error handling mechanism still need to be implemented to catch Cognito deletion errors and to recover them.

Wrap up

I have shown you my solution to real-world problems and how you can make informed decisions by carefully weighing trade-offs between different solutions that best fit your context and constraints.

In other words, the daily work of an architect, simplified.

Architectures need to evolve as the context and constraints change over time. So always design your solutions so that they can easily evolve with them.

I hope it was useful for you!

Bye 👋!