18/30 Days System Design Questions!

Joud Awad — Sat, 23 May 2026 15:45:16 +0000

Your Redis cache just expired on a key that 8,000 users hit every second.
Every single one of those requests is now flying straight at your database.

This is the thundering herd. You didn't have a traffic problem — you had a cache problem. Now you have both.

Here's the setup:
Service → Node.js API, 8,000 req/sec on the /feed endpoint
Cache → Redis, TTL = 60s on the feed key
DB → Postgres, comfortable at ~200 req/sec sustained
What happened → TTL expired at peak traffic, all 8,000 req/sec hit Postgres simultaneously

The DB is on its knees. You have minutes before it falls over. And the next TTL expiry is in 60 seconds.

What do you do?

A) Mutex lock — only one request queries the DB to rebuild the cache, the rest wait behind it.
B) Probabilistic early expiry — start randomly rebuilding the cache before the TTL actually hits zero.
C) Request coalescing — collapse all in-flight requests for the same key into a single DB query, return the same result to all of them.
D) Cache pre-warming — a background job rebuilds the key on a schedule, TTL never reaches zero in prod.

All four ship in production systems. Only one of them prevents the thundering herd without introducing a new failure mode under load.

Pick one — A, B, C, or D — and tell me why. Full breakdown in the comments (including which answer is the senior-engineer trap that works in theory but falls apart when 8,000 requests are piling up).

If your team has ever had a cache expiry take down a database, share this with them. The debate is worth more than the post.

Drop your answer 👇

30DaysOfSystemDesign #SystemDesign #DistributedSystems #SoftwareArchitecture

1/30 Days System Design Question

Joud Awad — Sat, 23 May 2026 09:13:45 +0000

our mobile app talks to 3 backend services directly.

A 4th one ships next sprint. The mobile team is already drowning.

Every new service means a new domain to whitelist, a new auth scheme to wire, and a new error shape to parse. You’re asked to reduce coupling before NotificationService lands.

Here’s the setup:

Mobile → UserService (users.api.com)

Mobile → OrderService (orders.api.com)

Mobile → PaymentService (payments.api.com)

…and NotificationService next sprint.

The client is doing routing the backend should be doing. What do you do?

A) Add an API Gateway — single entry point, all services hide behind one domain.

B) Build a BFF (Backend for Frontend) — a dedicated aggregation layer tailored for mobile.

C) Put a Load Balancer in front of all services — single IP, distributed traffic.

D) Switch to GraphQL Federation — one unified schema the client queries.

Three of these are real patterns you’d use in production. Only one of them actually solves the problem in front of you.

Pick one — A, B, C, or D — and tell me why. I’ll drop the full breakdown in the comments (including why two of the wrong answers are close enough to trick senior engineers).

If this is the kind of tradeoff question your team argues about, share it with them. The debate is worth more than the post.

Drop your answer 👇

Forem: Joud Awad

18/30 Days System Design Questions!

30DaysOfSystemDesign #SystemDesign #DistributedSystems #SoftwareArchitecture

1/30 Days System Design Question