Forem: Gaurav Sharma

Caching Is Easy. Production Caching Is Not.

Gaurav Sharma — Mon, 20 Apr 2026 06:22:50 +0000

This post is part of the series The True Code of Production Systems.

The first time you add caching to a system, it feels like a superpower.

One afternoon of work. Response times drop. Database load drops. The whole system breathes easier. You ship it, you move on, and somewhere in the back of your mind you file caching under "solved problems."

That filing is the mistake.

Because caching in production is not one decision. It is ten decisions, and most teams only consciously make one of them: the performance one. The other nine happen by default, by accident, or not at all. And defaults in production have a way of becoming incidents.

This article is about all ten. But before we get into them, let us look at a system where one of those defaults caused a real problem.

A Booking System That Did Everything Right. Almost.

A platform handles seat reservations for corporate training workshops. On a normal day it serves around two to three hundred requests per minute. The engineering team is small but experienced.

Workshop availability data was cached in Redis with a TTL of sixty seconds. The reasoning was sound — availability changes only when someone books or cancels. Caching it for a minute seemed perfectly reasonable, and for months it worked exactly as designed.

Then a well-known instructor announced a new batch of workshops on LinkedIn. The post got shared widely. Within minutes, several hundred users landed simultaneously to check availability and book seats.

The cached availability keys for those workshops had expired seconds before the spike hit. Every one of those hundreds of requests checked the cache, found a miss, and went directly to the database. The database — which had been handling 20–30 direct queries per minute — received several hundred simultaneous queries in a few seconds.

Connection pool exhausted. Query times climbed from milliseconds to seconds. The application started timing out. Users saw errors. Some refreshed, which made it worse. The platform was effectively down for four minutes during the highest-traffic window it had ever seen.

The cache was there. Redis was running fine. The TTL was set. Everything was configured.

Nobody had thought about what happens when a popular key expires at exactly the wrong moment.

We will come back to this system after the ten points. By then you will know exactly what went wrong and what a one-line fix would have looked like.

What Most Developers Think Caching Is

Cache the expensive query. Set a TTL. Use Redis. Done.

That mental model is not wrong. It is just incomplete. In production, every caching decision is simultaneously three other things:

A consistency decision — data in cache may no longer reflect reality
A reliability decision — a cache misbehaving under load can damage the system it was meant to protect
A cost decision — the wrong caching setup charges you quietly, consistently, and across more than one bill line item

Most developers ship caching thinking only about performance. The other three dimensions show up later, usually at inconvenient moments, usually pointing back to a decision that was never consciously made.

The Ten Things Production Caching Actually Requires

1. Your Caching Pattern Is a Choice. Make It Deliberately.

Most developers use Cache Aside without ever knowing they made a choice. The code checks the cache, finds a miss, goes to the database, stores the result, and returns it. It is the most common pattern. It works. But it is one of four — and each behaves differently in production.

Cache Aside puts the application in charge. You decide when to read from cache and when to write to it. This gives you flexibility, but every invalidation is your responsibility. Miss one code path that updates the underlying data without clearing the cache, and you silently serve stale data. No error. No alert.

Read Through moves that responsibility elsewhere — the cache itself fetches from the database on a miss. This keeps application code clean but creates a cold start problem: every fresh deployment begins with an empty cache, and until it warms up, your database absorbs full traffic.

Write Through writes to both cache and database on every write. Your cache is always in sync — but every write now has to complete in two places before returning to the caller.

Write Behind writes to cache immediately and updates the database asynchronously. Writes are very fast. But if the cache node goes down before the async write completes, that data is gone. Unless you have consciously decided that some data loss is acceptable, this pattern is not the right one.

Before you deploy, ask: What is my consistency requirement? Can users tolerate stale data, and if so, for how long? Which pattern actually matches that requirement?

2. Cache Invalidation: Why the Joke Is Not Actually a Joke

The two hardest things in computer science are cache invalidation and naming things. Most people chuckle and move on. They should sit with it longer.

TTL-based invalidation is what most systems use. Simple, easy to reason about, no inter-service coordination needed. The downside: TTL is a blunt instrument. Set it too long — users interact with stale data. Set it too short — you hammer the database repeatedly.

Event-based invalidation is more precise. When the underlying data changes, you immediately delete or update the cache key. The challenge is coverage: every single code path that can modify data must also trigger the invalidation. If you have five endpoints that update a user's profile and handle only four of them, you have a stale data bug that will appear random.

The situation that quietly destroys production systems is mixing both approaches across services with no shared strategy. Service A uses TTL. Service B uses events. Service C was written by a contractor six months ago. The cache becomes a state that no single person can fully reason about.

Ask yourself: Who owns cache invalidation in my system? Is there an actual strategy, or is each service doing its own thing independently?

3. The Cache Stampede: When Your Protection Collapses All at Once

This one catches even experienced teams off guard.

A popular cache key expires. At that exact moment, your system is handling high traffic. One thousand requests check the cache. All one thousand see a miss. All one thousand go directly to the database to fetch the data and rebuild the cache. Your database — which the cache was there to protect — absorbs a spike it was never provisioned to handle alone.

This is a cache stampede (also called a thundering herd). The irony: the more effective your cache, the worse the stampede when it fails.

Three ways to protect against it:

Mutex / locking — Only one request rebuilds a key at a time; others wait. Prevents the database spike but risks a queue buildup if the rebuild is slow.
Probabilistic early expiration — Before the TTL expires, the system starts refreshing the key using a probability function based on remaining TTL and rebuild cost. Hot keys effectively never go fully cold.
Background refresh — A dedicated worker keeps popular keys warm by refreshing them proactively before they expire. The application never experiences a true miss.

Ask yourself: What is peak concurrent traffic on my most accessed cache key? What happens to my database if that key expires right now, at this traffic level?

4. Some Things Should Never Be Cached

Knowing what not to cache is equally important and almost never discussed.

Transactional or financial data — account balances, order statuses, payment confirmations. If a user sees a balance that was accurate 30 seconds ago and makes a financial decision based on it, no performance gain justifies that. If stale data can cause a user to take a wrong action with real consequences, it should not be cached.

Highly personalised responses — the risk here is not performance. If your cache key does not capture every dimension that makes a response unique (user ID, role, tenant, locale, feature flags), you can serve one user's data to a completely different user. This has happened at companies of every size. The incident report always traces back to a cache key that was not specific enough.

Legally or contractually sensitive content — terms and conditions, regulated pricing, compliance documentation. Serving an outdated version is not just a UX problem. Depending on the industry, it can carry legal weight.

Ask yourself: If this cached value is served 60 seconds after it was written, what is the worst realistic outcome for the user receiving it?

5. Your Eviction Policy Is a Decision, Not a Default

Every cache has a memory ceiling. When it fills up, something gets removed. The question is whether that was a deliberate engineering choice or something that just happened because nobody changed the default.

In Redis, the default eviction policy is noeviction — when memory is full, Redis stops accepting writes and returns errors. That is almost certainly not the behaviour you want under load. Many teams discover this only when they are already in an incident.

Common strategies:

Policy	Removes	Best for
LRU	Least recently accessed key	Most general workloads
LFU	Least frequently accessed key	Workloads where long-term frequency matters more than recency
TTL-based	Key closest to expiry	Protecting long-lived data from short-lived displacement

Ask yourself: Have you explicitly configured your eviction policy? When your cache fills up at peak load, what should be protected and what should go?

6. The Cold Start Problem Nobody Prepares For

You deploy a new version of your application. The new instance comes up with a completely empty cache. For the first several minutes, every request is a miss. Every request goes to the database.

In a low-traffic system, barely noticeable. In a high-traffic system — or one with a database already near capacity — those first few minutes can look exactly like an incident. By the time someone traces it to the deployment, the cache has warmed up. The post-mortem notes it as "transient."

Until the next deployment.

Three approaches:

Cache warming on startup — Pre-populate your most-accessed keys before the new instance takes live traffic. Requires knowing your hot keys, which your observability setup should already surface.
Gradual traffic shifting — Old instances keep serving traffic with warm caches while new instances slowly build up state.
Sticky sessions during rollout — Routes users to consistent instances temporarily, limiting how many cold instances are simultaneously exposed to real traffic.

Ask yourself: What does your system look like in the 5 minutes immediately after a fresh deployment? Have you ever deliberately tested it?

7. Distributed Caching Is Not Just Single-Node Caching at Bigger Scale

When you move to a distributed cache cluster, the rules change in ways that are easy to miss.

Consider a write: your application updates cache node 1. Replication to node 2 is asynchronous and hasn't completed. Another request, routed to node 2, reads that key and gets the old value. Two users, the same request, nearly the same moment — different responses.

This is not a malfunction. It is the expected behaviour of an eventually consistent distributed system. The problem surfaces when the application is designed assuming strong consistency and the cache is delivering eventual consistency. That mismatch does not produce errors. It produces silent incorrectness.

Redis Cluster uses asynchronous replication. Under normal conditions, replication lag is milliseconds and practically invisible. But in failure scenarios — a node going down, a network partition, a failover — writes that were acknowledged can be lost before they propagate.

Ask yourself: Has your application been designed knowing that cache reads across nodes may not always be consistent? What actually happens to your users if they are not?

8. Security Gaps in Caching Are Invisible Until They Are Not

Here is how it goes wrong. You cache a response containing data belonging to a specific user. A second user sends a request that generates the same cache key. They receive the first user's cached response — their personal data, their account details, their private information — served silently to someone who should never have seen it.

This is a data breach that produces no exception, no error log, and no anomaly in performance metrics. The cache is working exactly as designed. The design is the problem.

The fix requires rigorous cache key scoping. Every dimension that makes a response unique must be part of the key: user ID, tenant ID, permission level, role, locale, feature flags. Leaving any out is not a minor oversight to patch in the next sprint. It is a live security incident waiting for the right traffic pattern.

The second concern: what lives in your cache at rest. Session tokens, access tokens, PII embedded in cached API responses. Most teams apply strict access controls to their databases. Not all of them apply the same rigour to their cache infrastructure.

Ask yourself: Are your cache keys scoped precisely enough that no response can ever be served to the wrong user? If your cache infrastructure were accessed by someone who shouldn't have it, what would they find?

9. If You Are Not Measuring Your Cache, You Do Not Know If It Is Working

A cache you cannot observe is either working fine or silently failing — and you have no way to tell which.

Three numbers tell you almost everything:

Hit rate — percentage of requests served directly from cache. A high, stable hit rate means the cache is doing its job. A rate slowly declining over days or weeks signals that data volatility has increased, TTLs have drifted, or a deployment changed behaviour upstream.

Miss rate — how often requests fall through to the database. A sudden spike means a stampede may be in progress, an invalidation pipeline has broken, or a deployment started cold.

Eviction rate — tells you whether your cache is sized correctly. A rising eviction rate means your working set is larger than your allocated memory. Data is being pushed out before it can be reused. Your hit rate follows downward. Your database load follows upward.

Together, these three numbers tell a continuous story. Without them, you are managing critical infrastructure entirely on faith.

Ask yourself: Can you pull up a live view of your cache hit rate, miss rate, and eviction rate right now? If not, that is the first thing to fix.

10. The Cost Is Real, and It Compounds Quietly

Under-provisioned cache: High eviction rates reduce hit rate → more database load → more compute needed → higher costs across multiple services.

Over-provisioned cache: You pay for memory that sits idle. Managed Redis on any major cloud provider bills idle capacity at the same rate as active capacity.

The right size comes from understanding your working set — the total data your application actually reads within a given time window. If your working set is 15 GB and your cache is 4 GB, you are not caching 15 GB. You are repeatedly evicting and re-fetching 11 GB of it, paying for database round trips on every cycle.

The other cost that accumulates quietly: data transfer. If your application instances and cache cluster live in different availability zones, you pay for cross-zone traffic on every cache read. On a high-traffic system with a high hit rate, that is an enormous number of reads. The per-request cost is small. The monthly total is not.

Ask yourself: Have you sized your cache from a working set analysis or from a number someone estimated at the start of the project? Do you know what your cross-zone cache traffic costs per month?

Back to the Booking System

Remember the platform that went down for four minutes? The cache was there. Redis was running. The TTL was set.

What they had not done was think about the stampede (point 3).

The availability keys for those popular workshops all had the same sixty-second TTL, set at roughly the same time when the workshops were first published. So they all expired together. When the traffic spike hit, every request found a cold cache simultaneously and went straight to the database.

The fix was not complicated. A background worker refreshing availability keys for popular workshops every 45 seconds would have kept those keys warm through the entire spike. The database would have seen normal traffic. Users would have seen normal response times.

One decision. Not made. Four minutes down.

That is what production caching actually looks like. Not a performance graph. A decision with a consequence.

The Thing That Ties All of This Together

Caching does not make your system faster.

Done right, it does. Done wrong, it makes your system faster right up until the moment it does not. And when it fails, it tends to fail suddenly, in ways that are difficult to trace back to a decision made quietly, months earlier, on an ordinary afternoon.

The engineers who build systems that hold up under real pressure are not necessarily smarter. They are more deliberate. They treat each of these ten things as a conscious choice rather than something that gets handled by default.

Make the choices. Write them down. Revisit them before you ship.

Production Ready Checklist

Go through this before anything involving caching reaches production. Not as a formality — as a genuine engineering checkpoint.

[ ] Have I consciously chosen a caching pattern and do I understand its consistency trade-offs?
[ ] Do I have a defined invalidation strategy with a clear owner, clear triggers, and handling for silent failures?
[ ] Have I protected my hottest cache keys against a stampede event?
[ ] Have I audited what I am caching and confirmed none of it is transactional, financial, or dangerous when stale?
[ ] Have I explicitly configured my eviction policy rather than accepting the default?
[ ] Have I planned and actually tested what happens in the first five minutes after a cold deployment?
[ ] Do I understand my cache cluster's replication and consistency model, and has my application been designed with that in mind?
[ ] Are my cache keys scoped precisely enough that no response can ever be served to the wrong user?
[ ] Do I have live monitoring for hit rate, miss rate, and eviction rate?
[ ] Have I sized my cache from a working set analysis and not from a rough estimate?

Originally published on The True Code — a series on production-critical engineering, stack-agnostic, with enough depth to actually change how you think.

Idempotency Is Not an API Thing: A Conversation Between Two Engineers

Gaurav Sharma — Wed, 15 Apr 2026 04:52:13 +0000

The junior engineer has been writing production code for three years.
He knows what idempotency means. Or at least he thinks he does.
He has used idempotency keys.
He has read the Stripe documentation.
He has nodded confidently in architecture reviews when someone said "make sure it's idempotent."
He is reasonably sure he understands it.
The senior engineer is about to ask one question that will change that.

The conversation starts with a definition that isn't quite right

Junior Engineer

"So idempotency. It's basically when you send an idempotency key with an API request, right? The server checks if it's seen that key before. If yes, it returns the cached response. If no, it processes it fresh. That way, retries don't cause duplicate operations."

Senior Engineer

"That's one way to implement idempotency in an API. But it's not what idempotency is.

Let me try something simpler. Think about an elevator button. When you're waiting for a lift and you press the button, it lights up. Now you press it again. And again. Does the lift come faster?"

Junior Engineer

"No. It's already been called. Pressing it again doesn't change anything."

Senior Engineer

"Exactly. That button is idempotent. You can press it once or a hundred times. The result is the same: the lift is coming. The extra presses don't create extra lifts. They don't undo the first press. They just do nothing on top of what's already been done.

That's what idempotency means as a concept. An operation is idempotent if you can run it multiple times and the result is exactly the same as running it once.

It's a property of an operation. Not a key. Not a header. Not something specific to APIs or HTTP. Just an answer to one question: if this runs again, does anything go wrong?"

Junior Engineer

"But in practice, the way you make something idempotent is with idempotency keys. Right? That's the pattern."

Senior Engineer

"In APIs, that's one tool. But let me show you why the concept is bigger than that.

You have a SQL job that runs every night at midnight. Its job is to take orders from a staging table and insert them into the main orders table. No API. No HTTP. No key anywhere.

If that job runs twice tonight, what happens?"

The junior pauses.

Junior Engineer

"If it just does a plain INSERT... every order gets inserted twice. Duplicate rows."

Senior Engineer

"Right. So the job is not idempotent. And nobody wrote an idempotency key into it. Because developers don't usually think about SQL jobs the way they think about APIs.

Which means most SQL jobs in most systems are quietly not idempotent. And nobody realises it until the job runs twice. Which it will, eventually."

He lets that sit for a moment.

Senior Engineer

"This is what I want us to talk through today. Not idempotency as an API feature. Idempotency as a discipline. A way of thinking about every operation you build, regardless of what kind it is. Because more operations can run more than once than most engineers think."

When does something run more than once?

Junior Engineer

"Okay. I get the concept. But when would something actually run more than once by accident? I'd expect that to be pretty rare."

Senior Engineer

"It's actually one of the most common things in production. Let me walk through some everyday situations.

Your app calls an external payment API. The network is slow. After 30 seconds, your app gets no response and assumes it failed. So it retries. But the first call actually did succeed. The payment went through. Now it goes through again.

Your scheduled Azure Function is set to run at 2 AM every night. But last night's run is still going because it hit a slow database query. At 2 AM tonight, a new run starts. Now two instances are running at the same time, both doing the same work.

A developer runs a data migration script on a Friday to fix a production issue. On Monday, a second developer, not knowing about Friday's run, runs the same script again to double-check. The script runs twice.

A message arrives in Azure Service Bus. Your consumer picks it up, starts processing it, and then the server crashes halfway through. Service Bus doesn't hear back from the consumer. After a few minutes, it assumes the message was never processed. So it puts the message back and delivers it again to a new consumer instance.

A Kubernetes pod is running a background job. The cluster decides to move the pod to a different node. The pod is killed mid-job. Kubernetes starts it fresh on a new node. The job runs again from the beginning."

Junior Engineer

"So these aren't edge cases. These are just... normal production situations."

Senior Engineer

"Completely normal. Every one of those things happens regularly. And in every one of those situations, if your operation isn't idempotent, you get problems.

Duplicate rows in the database. A customer charged twice. The same email sent twice. A counter that's off by one or ten. Or the worst kind: silent data corruption that nobody notices for three days, until a customer calls support."

Junior Engineer

"And the customer doesn't know any of this happened. They just see the wrong charge on their statement."

Senior Engineer

"Exactly. And your support team has to investigate manually, trace through logs, figure out what ran when, and apologise. All of that cost comes from one missing design decision made before the feature was built."

Scenario 1: The payment API, and what happens when the key isn't enough

Senior Engineer

"Let's start where you're most familiar. A REST API for placing an order and charging a card. You said you'd use an idempotency key. Walk me through how that works."

Junior Engineer

"The client, meaning the front end or the calling service, generates a unique ID before making the request. Usually a UUID, something like '7f3d2c1a-...'. It sends that in the request header. When the server receives the request, it checks a table: have I seen this key before? If yes, return the response I stored last time. If no, process the order and save the response against this key."

Senior Engineer

"Good. That's the right idea. Now let me walk through one specific failure scenario and I want you to tell me what happens.

Your server receives the request. It checks the key: not seen before, so it starts processing. It calls the payment gateway. The gateway charges the card successfully. But then, before your server can write the order record to the database and save the idempotency key, the server crashes. Power cut, out of memory, doesn't matter. It just dies.

The client gets a timeout. No response. What does the client do?"

Junior Engineer

"It retries. With the same idempotency key."

Senior Engineer

"Your server restarts. The new request arrives. It checks the key table. Does the key exist?"

Junior Engineer

"No. Because we never saved it. We crashed before that step."

Senior Engineer

"So what does the server do?"

Junior Engineer

"It treats the request as new. It calls the payment gateway again."

The junior goes quiet for a second.

"The customer gets charged twice."

Senior Engineer

"Yes. And no error happened anywhere. Every individual step worked correctly. The payment gateway did its job. The retry logic did its job. The idempotency check did its job. But the order of operations was wrong, and the whole thing still failed the customer.

This is the gap most developers miss. The key only works if you save it as part of the same operation as the work. Not before. Not after. Together. If your database write and your key storage are not in the same transaction, there's a window where a crash leaves you in the worst possible state: work done, but no record of it."

Junior Engineer

"So the idempotency key is only as safe as the transaction around it."

Senior Engineer

"Yes. And there's a second problem that's just as common. The client sends the same key, but the server isn't crashed. It's just slow. The first request is still being processed. The client gets impatient and retries.

Now two requests with the same key are being processed at the same time. What does your server return to the second one?"

Junior Engineer

"I don't know. Maybe a 500 error? Or it just waits?"

Senior Engineer

"Most servers return something unhelpful there. The correct answer is a 409 Conflict. A response that says, in plain terms: 'I've already seen this key and I'm still working on it. Wait a moment and try again.'

Or if your operation is asynchronous, meaning it takes a long time and runs in the background, you return a 202 Accepted with a link the client can check to see the status.

The point is: an idempotency key isn't just a deduplication trick. It changes what your system has to communicate in every possible situation. Seen the key and done the work: return the stored result. Seen the key and still working: say so. Never seen the key: do the work. Each case needs a clear, intentional answer."

Junior Engineer

"I've never thought about the 'still working' case. I just assumed the system would either have done it or not."

Senior Engineer

"Most developers haven't. And one more thing. How long do you keep the keys?"

Junior Engineer

"I... haven't really decided. Until the row gets cleaned up, I suppose."

Senior Engineer

"That's the answer most systems give. Which means some systems keep them forever until the database gets large, and some delete them too early, which means a retry that comes in four hours later looks like a brand new request.

Stripe keeps idempotency keys for 30 days because that's long enough to cover any realistic retry window for a payment. Most internal systems don't need 30 days. But they need a number. A deliberate decision. Not a default that nobody chose."

Scenario 2: The nightly SQL job that nobody worries about

Junior Engineer

"Let's go back to the SQL job example. You said most of them aren't idempotent. How do you actually fix that?"

Senior Engineer

"First, let's be very concrete about the problem. Imagine your company runs an Azure Data Factory pipeline every night at midnight. It reads from a staging table where raw transaction data lands throughout the day, and it inserts those transactions into a clean fact table that the reporting team uses.

On a normal night, it runs once. Everything is fine. But one night there's a network blip halfway through and the pipeline fails. The on-call engineer sees the alert and reruns it manually. Now the pipeline runs again from the beginning. What happens to the rows that already got inserted in the first partial run?"

Junior Engineer

"They get inserted again. Duplicate rows in the fact table."

Senior Engineer

"And the reporting team runs their reports the next morning not knowing any of this happened. The numbers are wrong. Maybe slightly wrong, maybe very wrong depending on how far the first run got. And tracing it back is painful.

The fix is to change the question the job asks. Instead of 'insert this data', it should ask 'make this data exist.' There's a big difference.

A plain INSERT says: add this row, I don't care if it's already there. An upsert, or a MERGE in SQL terms, says: if this row already exists, update it to match. If it doesn't exist, create it. Either way, when you're done, the data looks exactly like it should.

Run that job once: correct state. Run it ten times: same correct state. The job is now idempotent."

Junior Engineer

"But to do a MERGE, you need some way to recognise whether a row already exists. Like a unique ID to match on."

Senior Engineer

"Exactly. And this is where the design conversation starts. Idempotency requires identity. To know whether you've already done something, you need a reliable way to recognise that thing when you see it again.

For an order, that's probably an order ID from the source system. For a transaction, maybe a combination of the transaction reference and the date. For an event log, maybe a hash of the key fields.

The point is: if your data model has no natural key, idempotency becomes much harder. This is a design decision you make early. And if you don't make it deliberately, production will eventually make it for you, in the worst possible way."

Junior Engineer

"So idempotency isn't just about the job. It starts with the data model."

Senior Engineer

"Yes. And there's a trap I see often that looks safe but isn't. A developer writes something like: check if this row already exists, and if not, insert it. Sounds fine. But what if two instances of the job run at the same time? Both do the check. Both find no existing row. Both try to insert. You get duplicate rows anyway.

The check-then-insert pattern only works if exactly one thing is running at a time, which you often can't guarantee. The database's own uniqueness constraint, combined with an upsert operation, is the only way to get a guarantee that holds under any conditions."

Scenario 3: The console app that runs every 15 minutes

Junior Engineer

"What about a background worker? Like a console app or a WebJob that runs on a schedule?"

Senior Engineer

"Good example. Let's say you have an Azure WebJob that runs every 15 minutes. Its job is to pick up new customer records, call an external enrichment API to add extra details, and write the enriched records to Azure Blob Storage.

Two problems can happen here, and neither of them feels like a bug at first.

Problem one: the job takes 16 minutes. A slow response from the enrichment API. By the time it finishes, the next scheduled run has already started. Now two instances are running at the same time, processing the same batch of records. Neither knows the other exists."

Junior Engineer

"They'd both write to the same blobs. One would overwrite the other."

Senior Engineer

"Maybe. Or they'd both call the enrichment API for the same customer, getting billed twice for that API call. Or one finishes first and marks the record as done, but then the second finishes and marks it done again with slightly different data because the API returned something different the second time around.

Problem two: the job processes a record, writes the blob, then crashes before marking that record as done. Next run, it picks up the same record again and runs the whole thing over."

Junior Engineer

"For the second problem, if the blob gets overwritten with the same data, isn't that fine?"

Senior Engineer

"Only if the enrichment API always returns identical data for the same input. If it returns a price, a stock level, a timestamp, anything that can change between calls, then the second write might have different data. Now your system has processed the same record twice and stored two different results, one of which got silently overwritten.

You might never notice. Until someone asks why customer records from a specific period look inconsistent."

Junior Engineer

"So how do you stop the overlap problem? The two instances running at the same time?"

Senior Engineer

"A distributed lease. Before the job starts its work, it tries to claim a lock on a shared resource. In Azure, you can use a Blob Storage lease for this. Think of it like a physical key to a room. Only one person can hold the key at a time. The job picks up the key before it starts. If another instance tries to start and finds the key already taken, it simply exits. It doesn't fight. It doesn't wait. It just walks away.

When the first job finishes, it releases the key. The next scheduled run picks it up normally.

One run at a time. Clean."

Junior Engineer

"So in this case, the solution isn't making the operation idempotent. It's preventing the duplication from happening at all."

Senior Engineer

"Right. And that's an important distinction. Idempotency means tolerating duplication. Prevention means eliminating it. Both are valid. Often you want both: prevent where you can, tolerate where you can't. The blob lease prevents concurrent runs. The upsert write tolerates the occasional restart where the same record gets processed twice."

Scenario 4: Message queues and the guarantee that surprises people

Junior Engineer

"Message queues. I know at-least-once delivery means a message might arrive more than once. So idempotency matters there."

Senior Engineer

"Yes. But I want to make sure the 'at-least-once' part is clear, because a lot of developers hear it and think 'that's an edge case, it rarely happens.'

It's not an edge case. Azure Service Bus guarantees that a message will be delivered. It does not guarantee it will only be delivered once. The reason is: to know that a message was truly processed, Service Bus needs the consumer to send back an acknowledgement. If the consumer processes the message and then crashes before sending that acknowledgement, Service Bus has no idea the work was done. So it re-delivers the message. It has to. The alternative is losing the message entirely, which is worse.

So duplicates aren't a bug. They're the price of reliability. And your consumer has to be built with that in mind."

Junior Engineer

"Doesn't Service Bus have duplicate detection built in though? I've seen a setting for it."

Senior Engineer

"It does. And this is a common source of false confidence. Service Bus can detect if the exact same message is delivered twice within a time window, based on the message ID. That covers broker-level duplicates, situations where the broker itself sends the same message twice.

But it doesn't cover the scenario I just described. If your consumer crashes after processing but before acknowledging, Service Bus delivers the message again, but from its perspective, that's a legitimate re-delivery of a message that was never confirmed, not a duplicate. The duplicate detection won't catch it.

Your consumer needs to handle it."

Junior Engineer

"So how do you make a consumer handle it correctly?"

Senior Engineer

"The cleanest approach depends on what your consumer does.

If your consumer is writing to a database and there's a natural business key on the record, like an order ID, just use an upsert. Write the record if it doesn't exist, update it if it does. Processing the same message ten times leaves you with exactly one record in the correct state. No extra infrastructure needed.

If your consumer has side effects beyond a database write, like sending an email or calling a payment gateway, you need to track what you've already done. Before processing a message, check whether that message has already been processed successfully. Azure Cache for Redis works well here: store the message's business ID with a short expiry. If it's already there, skip the processing and just acknowledge the message. Simple check before every action.

The key choice is which ID to track. Service Bus gives every message its own message ID, which is an infrastructure concept. But your message also contains a business concept, an order number, a customer ID, a transaction reference. Use that as your deduplication key. It's the thing that actually means something to your system, and it survives across retries and redeliveries in a way that infrastructure IDs sometimes don't."

Junior Engineer

"How do you decide which approach to use?"

Senior Engineer

"Ask one question: what is the cost if this runs twice?

If the cost is nothing, the operation is naturally safe, just use an upsert and move on.

If the cost is money, like a payment, or trust, like a notification, or anything the customer will notice, then you need the explicit check.

The answer to that question tells you exactly how much effort to invest."

Scenario 5: Azure Functions, serverless, and why state can't live in memory

Junior Engineer

"What about Azure Functions? The HTTP-triggered kind."

Senior Engineer

"Azure Functions are a great example of why understanding idempotency as a concept matters more than knowing any specific implementation.

Here's what makes Functions different. A regular web app might run as one or two instances. You might even be able to pretend it's a single server in some situations. A Function can scale to dozens or hundreds of instances in seconds. If 500 users all click the same button at the same time, 500 separate Function instances could all be handling those requests simultaneously.

Each instance is completely isolated. It has no memory of what other instances are doing. It doesn't know if another instance is already processing the exact same request that came in twice due to a network retry."

Junior Engineer

"So you can't store 'have I seen this request before' in memory. Because memory is per-instance."

Senior Engineer

"Exactly. The only place where truth can live is somewhere that all instances can read from and write to. A database. Azure Table Storage. Redis. Something external and shared.

The logic is the same as what we discussed before: when a request arrives, check a shared store for that idempotency key. If it exists, return the stored result without doing the work again. If it doesn't exist, do the work, save the result and the key to the shared store, and return.

The extra question with Functions is concurrency. What if two instances receive the same request at almost the same moment, both check the store, both find no key, and both start processing?

You let the database handle that. If you're using Azure Table Storage, it has optimistic concurrency built in. Only one write will succeed when two try to insert the same key at the same time. The second one gets a conflict error. At that point your Function catches the conflict, re-reads the stored result from the first instance, and returns it. Clean."

Junior Engineer

"So the Function itself doesn't need to be complicated. The data layer does the heavy lifting."

Senior Engineer

"Yes. And that's the general principle in serverless: compute is cheap and disposable. Data is where guarantees live. Your idempotency design has to be in the data layer, not the compute layer. Functions just execute whatever logic you give them. They don't remember anything between invocations unless you build that memory into storage."

Why this all matters beyond just preventing duplicates

Junior Engineer

"So we've gone through APIs, SQL jobs, console apps, queues, and Functions. I understand the problem better now. But what's the bigger payoff? Why invest in this properly?"

Senior Engineer

"Let me ask you something. When you're testing a feature, what makes testing hard?"

Junior Engineer

"Writing tests for all the different things that can go wrong. Error cases, edge cases, unexpected sequences of events."

Senior Engineer

"Right. Now think about retry scenarios specifically. If you have an operation that isn't idempotent, you need test cases for: what if the client retried once? What if it retried three times? What if two retries overlapped? What if the first attempt half-succeeded and then the retry came in?

Each of those is a separate test scenario. Each one requires setup, assertions, and maintenance.

If the operation is idempotent, all of those scenarios collapse into one: does the operation produce the correct result? You don't care how many times it ran. The result is always the same.

That's a real, measurable reduction in test surface. Fewer tests to write. Fewer tests to maintain. Fewer bugs that come from test gaps."

Junior Engineer

"And when something goes wrong in production, what changes?"

Senior Engineer

"This is where it matters most for your day-to-day life as an engineer.

When something isn't idempotent and it fails, your recovery process is: investigate what ran, figure out what got into the database and what didn't, write a script to fix the inconsistency, test the script, run the fix, verify the result, and update the customer. That process takes hours. Sometimes days if the failure was subtle.

When something is idempotent and it fails, your recovery process is: run it again. That's it. It takes minutes. And you can do it with confidence because you know that running it again will leave the system in the correct state, not make things worse."

Junior Engineer

"So idempotency changes your 3 AM incident from 'I need to figure out what happened and carefully fix the data' to 'I just rerun the job and go back to sleep.'"

Senior Engineer

"Yes. And it changes things beyond incidents too.

Imagine you need to replay six months of transactions through a new processing pipeline you just built. If your pipeline is idempotent, you just run the data through and whatever already exists gets updated to match, whatever's missing gets created. No risk.

If your pipeline isn't idempotent, replaying data means duplicates everywhere. You have to build cleanup logic before you can even start. What should be a straightforward migration becomes a careful, scary operation.

Idempotency turns reruns from something you fear into something you can do without thinking."

Junior Engineer

"You mentioned earlier that idempotency is also a contract with your callers. Can you say more about that?"

Senior Engineer

"Every system you build is used by other systems. Other services call your API. Other jobs consume your queue. Other pipelines read your output.

When those callers hit an error or a timeout, they have to decide: do I retry? If your operation is idempotent, the answer is always yes. Retry as many times as you need. You won't cause any harm.

If your operation is not idempotent, the answer is: maybe. It depends on what stage the previous request reached. The caller now has to write complicated state-tracking logic to figure out whether it's safe to retry. Their code gets more complex because your design didn't make a guarantee.

Most teams never write down which operations are idempotent and which aren't. Callers guess. When they guess right, nothing bad happens. When they guess wrong, you get an incident that looks mysterious until someone traces through logs for two hours and realises a retry caused a double charge.

One sentence in your documentation, 'this endpoint is idempotent, it is safe to retry with the same key', prevents that entire category of problem. Good engineering is also good communication. They're the same thing."

The one question to ask before you build anything

Junior Engineer

"If I take one thing from this conversation and apply it to every new piece of work I do, what should it be?"

Senior Engineer

"Before you build any operation, ask: can this run more than once?

Not 'will it.' Because the answer to 'will it' is often 'probably not.' The answer to 'can it' is almost always 'yes.'

Networks are unreliable. Servers crash. Deployments overlap. Developers rerun scripts. Queues redeliver messages. Schedulers fire twice. In the real world, any operation that can run once will, at some point, run more than once.

So once you've accepted that, the question becomes: if it runs twice, what happens?

If the answer is 'nothing bad,' you're fine.

If the answer is 'duplicates,' 'double charges,' 'inconsistent state,' or 'I'm not sure,' then idempotency is not an optional nice-to-have. It's a requirement. And the time to think about it is before you build, not after production finds the problem for you."

Junior Engineer

"So it's not a pattern you add on top. It's a question you answer during design."

Senior Engineer

"Yes. And one more thing worth remembering.

Most bugs in distributed systems aren't caused by failures. They're caused by successful operations that ran more than once."

He closes his laptop.

Senior Engineer

"A failure is visible. You get an error. An alert fires. A log entry appears. You know something went wrong and you go fix it.

An operation that succeeds twice is invisible. No error. No alert. No log that says anything is wrong. Just a customer who checks their statement and finds two charges. Or a report that shows slightly wrong numbers. Or an email that went to 50,000 people twice.

The damage is quiet. And you find it when someone complains, not when your monitoring catches it. Because your monitoring is watching for failures. And this wasn't a failure. It was a success. Twice."

Idempotency is not something you add to an API when Stripe tells you to.
It's not a checkbox in a design review.
It's not something senior engineers think about and junior engineers don't.
It's a question. One question, asked before every operation you design:

If this runs again, what happens?

Answer that question clearly, in SQL jobs, in background workers, in message consumers, in serverless Functions, in every place code runs that can run more than once, and a whole category of production incident quietly stops happening.

Not because anything is perfect. But because you designed for the world as it actually is.

I write at thetruecode.com about real lessons from production systems, engineering teams, and 20 years of making things work under pressure.

Let's connect on LinkedIn.