Forem: Siddhant Jain

Your Agent Retried. The Email Sent Twice.

Siddhant Jain — Fri, 24 Apr 2026 18:10:00 +0000

A practical guide to idempotency, budget guardrails, and risk gates for TypeScript AI agents — with real code.

There's a class of production bug that doesn't throw an exception, doesn't show up in your error logs, and doesn't alert anyone. Your AI agent just quietly does the same thing twice.

The email sends twice. The Stripe charge fires twice. The database record duplicates. No stack trace. No crash. Just a confused user and a support ticket you don't want to explain.

This isn't a rare edge case. It's what happens when you take agent frameworks — tools designed for reliability through retries — and connect them to side effects that aren't designed for retries.

Let me show you exactly what goes wrong, and exactly what fixes it.

Why Every Agent Framework Retries

LangGraph, Vercel AI SDK, Mastra, and OpenAI Agents SDK all have the same design assumption baked in: LLM calls and tool calls can fail transiently, so the framework should retry them.

This is a reasonable assumption. Networks timeout. APIs rate-limit. A 30-second agent workflow failing at step 28 because of a 500ms blip would be maddening if you couldn't retry. LangGraph has a RetryPolicy object you pass directly to nodes. Vercel AI SDK defaults to maxRetries: 2 (3 total attempts). Mastra handles retries at the workflow runner level.

The problem is not that they retry. The problem is that they retry without knowing what your side effect already did.

The framework sees a timeout. It doesn't know whether your tool call completed before the timeout or during it. So it retries the whole thing. And if your tool calls resend.emails.send() or stripe.charges.create(), those run again — because nothing told them not to.

This is a well-documented issue. LangGraph's own GitHub discussions have open threads about tools being called repeatedly until hitting the recursion limit. An arXiv study surveying 12 major agent frameworks found that no framework enforces exactly-once execution at the tool-call level — they all delegate that responsibility to you.

The Real Cost of Getting This Wrong

In November 2025, a team's LangGraph research pipeline had four agents working together. Two of them — an Analyzer and a Verifier — got stuck ping-ponging requests to each other. No step limits. No cost ceiling. No alert. The loop ran for 11 days while the team assumed the growing API bill was organic growth. Final cost: $47,000.

That's an extreme case. But the everyday version of this is much more common: a user clicks "Submit" twice, the network is slow, your agent fires the tool twice, and a welcome email hits their inbox twice. Or a Stripe charge creates a duplicate. It's not catastrophic — it's just the kind of bug that erodes user trust and creates support burden.

The industry is starting to take this seriously. A LinkedIn post from an AI engineering lead put it clearly: "Most AI pipelines are one bad retry away from a silent infinite loop." The community thread on preventing duplicate tool execution has hundreds of engineers describing the exact same failure mode. OpenAI's own community forum flags it as a major issue for create operations.

What Doesn't Fix It

Before getting to the solution, it's worth naming the things that look like solutions but aren't:

Prompt engineering ("don't call the same tool twice") is not enforceable. Agents re-plan. They retry. They don't remember what they said in a previous prompt injection scenario.

Reducing maxRetries doesn't help either. Duplicate calls can happen on the first attempt if the response is slow or ambiguous. Fewer retries just means more failures, not fewer duplicates.

Client-side deduplication — tracking calls in memory on the agent side — breaks across crashes, timeouts, parallel subagents, and worker restarts.

The fix has to live at the tool boundary, outside the agent's reasoning loop, in something the agent cannot bypass.

The Fix: Application-Level Guardrails with `@keelstack/guard`

@keelstack/guard is a small TypeScript package — 180 lines, zero runtime dependencies — that wraps any async action with three production safety primitives. You don't configure anything to start. You just wrap the call.

Primitive 1: Idempotency

The core idea is simple: give every logical operation a stable key. The first time that key is seen, the action runs and the result is stored. If the same key comes in again — from a retry, a parallel agent, a workflow resume — the stored result is returned and the action is not run again.

import { guard } from '@keelstack/guard';

const result = await guard({
  key: `send-welcome:${userId}`,       // stable, unique per operation
  action: () => resend.emails.send({
    to: user.email,
    subject: 'Welcome to the app!',
  }),
});

console.log(result.status);     // 'executed' | 'replayed'
console.log(result.fromCache);  // false | true

When the agent retries with the same key, result.status comes back as 'replayed'. The email service never receives a second call.

The key is doing the real work here. A good idempotency key is stable and scoped to the specific logical operation — stripe-charge:${invoiceId} is right, op-${Date.now()} is wrong because it changes on every retry. The README has a full guide on key construction .

By default, results are stored in-memory with a 24-hour TTL. For production deployments with multiple instances, you bring your own Redis-backed Ledger:

import type { Ledger, LedgerEntry } from '@keelstack/guard';
import { createClient } from 'redis';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

const redisLedger: Ledger = {
  async get(key) {
    const raw = await redis.get(`guard:${key}`);
    return raw ? (JSON.parse(raw) as LedgerEntry) : undefined;
  },
  async set(key, entry) {
    const ttl = Math.max(0, Math.floor((entry.expiresAt - Date.now()) / 1000));
    await redis.set(`guard:${key}`, JSON.stringify(entry), { EX: ttl || undefined });
  },
  async delete(key) { await redis.del(`guard:${key}`); },
  async list() {
    const keys = await redis.keys('guard:*');
    const entries = await Promise.all(keys.map(k => redis.get(k)));
    return entries.flatMap(e => e ? [JSON.parse(e) as LedgerEntry] : []);
  },
  async prune() { return 0; },
};

const result = await guard({ key: 'my-op', action: myAction, ledger: redisLedger });

A first-party @keelstack/guard-redis adapter is on the roadmap — but the interface is simple enough to wire yourself today .

Primitive 2: Budget Enforcement

This is the $47,000 problem solved at the tool level. You set a per-user spend limit. Before every action, the guard checks current spend against the limit. If the limit is hit, the action is blocked and returns status: 'blocked:budget' — the agent never makes the API call.

const result = await guard({
  key: `ai-call:${userId}:${requestId}`,
  action: () => openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: prompt }],
  }),
  budget: {
    id: userId,
    limitUsd: 2.00,           // hard cap: $2 per user per day
    warnAt: [0.5, 0.8],       // warn callbacks at 50% and 80%
    onWarn: ({ percentUsed, id }) => {
      console.warn(`User ${id} at ${(percentUsed * 100).toFixed(0)}% of budget`);
    },
  },
  extractCost: (res) => {
    const tokens = res.usage?.total_tokens ?? 0;
    return (tokens / 1_000_000) * 15;   // gpt-4o pricing
  },
});

if (result.status === 'blocked:budget') {
  return Response.json({
    error: 'Daily AI budget exceeded',
    spent: result.budgetInfo?.spent,
    limit: result.budgetInfo?.limit,
  }, { status: 429 });
}

The extractCost function is how you tell the guard what each call costs. It could be token-based, flat-fee, or any formula that makes sense for your API. The guard accumulates spend per id and enforces the hard stop before the next call fires .

Primitive 3: Risk Gate

Some actions shouldn't run without explicit intent — deleting a record, cancelling a subscription, sending a bulk communication. The risk gate classifies actions as safe, reversible, or irreversible, and applies a policy: allow, log, warn, or block.

const result = await guard({
  key: `delete-account:${userId}`,
  action: () => db.users.delete({ where: { id: userId } }),
  risk: {
    level: 'irreversible',
    policy: 'block',
    onRisk: (info) => {
      auditLog.write({
        key: info.key,
        level: info.level,
        blocked: info.blocked,
      });
    },
  },
});

if (result.status === 'blocked:risk') {
  return Response.json({ error: 'Action blocked by risk policy' }, { status: 403 });
}

Default policies if you don't specify: safe → allow, reversible → log, irreversible → warn. The onRisk callback fires regardless of whether the action was blocked — useful for audit trails .

The Fourth Primitive: Failure Handling

FailureConfig controls what happens when an action throws. The default (policy: 'retry') rethrows the error and leaves the key uncached, so future attempts with the same key can try again. The compensate policy lets you run a cleanup callback before rethrowing — useful when you need to undo a partial side effect:

const result = await guard({
  key: `provision-resource:${tenantId}`,
  action: () => cloudProvider.createInstance(config),
  failure: {
    policy: 'compensate',
    onError: async ({ key, error }) => {
      await alertOpsTeam(`Failed to provision ${key}: ${error.message}`);
    },
  },
});

Putting It All Together

These primitives compose. A single guard() call can enforce all three simultaneously:

const result = await guard({
  key: `stripe-charge:${invoiceId}`,
  action: () => stripe.charges.create({
    amount: amountCents,
    currency: 'usd',
    customer: customerId,
  }),
  budget: {
    id: tenantId,
    limitUsd: 500,
    warnAt: [0.7, 0.9],
  },
  extractCost: () => amountCents / 100,  // actual charge amount
  risk: {
    level: 'irreversible',
    policy: 'log',
    onRisk: (info) => auditLog.write(info),
  },
});

One wrapper. The charge cannot fire twice (idempotency). It cannot exceed your tenant spend ceiling (budget). Every execution is logged to your audit trail (risk gate).

Works With Any Framework

Because guard() wraps any async () => T, there's no framework coupling. You use it inside whatever tool definition your framework expects:

Vercel AI SDK:

import { tool } from 'ai';
import { guard } from '@keelstack/guard';
import { z } from 'zod';

const sendEmailTool = tool({
  description: 'Send a confirmation email',
  parameters: z.object({ userId: z.string(), subject: z.string() }),
  execute: async ({ userId, subject }) => {
    return guard({
      key: `send-email:${userId}:${subject}`,
      action: () => resend.emails.send({ to: await getEmail(userId), subject }),
    });
  },
});

LangGraph.js:

import { tool } from '@langchain/core/tools';
import { guard } from '@keelstack/guard';
import { z } from 'zod';

const chargeUserTool = tool(
  async ({ amountUsd, invoiceId }) => {
    const result = await guard({
      key: `stripe-charge:${invoiceId}`,
      action: () => stripe.charges.create({ amount: amountUsd * 100, currency: 'usd' }),
      risk: { level: 'irreversible', policy: 'log' },
    });
    return result.value;
  },
  {
    name: 'charge_user',
    schema: z.object({ amountUsd: z.number(), invoiceId: z.string() }),
  }
);

Honest Limitations

The default in-memory ledger is process-local. If you run two Node.js instances, they don't share deduplication state — you need a shared Redis backend for that. Simultaneous same-key calls within a single process are lock-joined, but cross-process race safety depends on your ledger implementation's atomicity guarantees .

The package doesn't do loop detection (identifying a repeating sequence of different tool calls). It doesn't do semantic caching. It solves the specific, well-defined problem of duplicate side effects from retried tool calls — and it solves that problem cleanly.

Install It Today

npm install @keelstack/guard

Node ≥ 20. TypeScript ≥ 5 (optional but recommended). Zero runtime dependencies. MIT licensed .

The full source is on GitHub — five files, no magic, no hidden deps.

Coming next: first-party @keelstack/guard-redis adapter, OpenTelemetry spans per guard call, and a hosted dashboard to see every replayed duplicate and budget block per user and per agent. If you want early access and want to shape what the dashboard looks like, join the beta waitlist.

If you've hit this problem in production — or if you have a horror story about an agent that wouldn't stop calling the same endpoint — I'd genuinely like to hear it. The comments are open.

The “Token Bleed”: How to Operate LLMs Without Bankrupting Yourself

Siddhant Jain — Fri, 03 Apr 2026 18:06:11 +0000

Experts across infra, SRE, and product‑engineering circles don’t have one single “rulebook,” but the consensus from real‑world write‑ups and discussions is clear: if you’re building an “AI wrapper” or LLM‑based product, the way you succeed (and avoid backlash) is by focusing on the hard infrastructure and reliability problems, not just the UI or “vibe.”

We learned this the hard way. In one project we ran, we watched a single runaway agent hit six figures in tokens before the dashboard even refreshed. Another time, we tried in‑memory counters for budgets – after a restart, everyone’s limit was reset and we started overbilling users. Oops.

A single bug or malicious user can still drain $1,000 of OpenAI credits in an hour. But the fix isn’t a “better wrapper” – it’s LLM operations: treating the model like any other expensive, unreliable external service (Stripe, S3, Kafka). Let’s walk through the patterns that protect your wallet, then see one concrete implementation.

1. The principles (code‑agnostic, works for any LLM)

Before we touch code, internalise these four guardrails. They apply whether you’re using OpenAI, Anthropic, Llama, or a mix.

① Per‑user / per‑org token budgets (with rolling windows)

Every token‑consuming request should be associated with a budget context. We found it safer to enforce hourly or daily limits that persist across restarts – an in‑memory counter that resets when your process dies is useless. (Yes, we learned that one the expensive way.)

② Per‑job circuit breakers

Long‑running AI tasks (summaries, batch inference) can loop or stall. You need a way to kill a job mid‑stream when it exceeds a cost or time threshold. That requires persistent job state: the worker must periodically check if it’s still allowed to continue.

③ Idempotency for every mutating request

Retries, webhooks, and double‑clicks are silent budget killers. Every request that calls an LLM should carry an idempotency key. The first request processes; duplicates receive a cached response – no extra tokens.

④ Crash‑recoverable job queues

If a worker dies while an LLM call is streaming, you risk orphaned billing. Jobs must be stored in a durable queue (Redis, Postgres) with atomic claiming and a recovery mechanism for “processing” jobs that exceed a timeout.

What this costs in real life

A single runaway loop generating 10M tokens at GPT‑4o rates (~$2.50/1M input, $10/1M output) can burn $100+ in minutes. Without these four patterns, you’re exposed. (And yes, that’s a cheap model – imagine Anthropic Claude.)

2. Turning patterns into code (using KeelStack as one example)

Wrappers are easy; guardrails are hard. This is more complex than a toy wrapper. That’s the point.

The code below comes from KeelStack – an open‑source framework that ships with a budget‑aware LLM gateway out of the box. But the patterns are universal; you could implement them yourself (if you enjoy debugging distributed state at 2am).

Pattern ① → `TokenBudgetTracker`

// Per-user hourly token budget.
// Tracked in DB — survives restarts, enforced globally.
// Configurable per user tier or plan.

export class TokenBudgetTracker {
  private readonly usage = new Map<string, { tokens: number; windowStart: number }>();
  private readonly windowMs = 3_600_000; // 1 hour

  constructor(private readonly budgetPerWindow: number) {}

  canSpend(userId: string, estimatedTokens: number): boolean {
    const now = Date.now();
    const record = this.usage.get(userId);
    if (!record || now - record.windowStart > this.windowMs) return true;
    return record.tokens + estimatedTokens <= this.budgetPerWindow;
  }

  record(userId: string, tokensUsed: number): void {
    const now = Date.now();
    const existing = this.usage.get(userId);
    if (!existing || now - existing.windowStart > this.windowMs) {
      this.usage.set(userId, { tokens: tokensUsed, windowStart: now });
    } else {
      existing.tokens += tokensUsed;
    }
  }
}

Before calling the LLM, we call canSpend(). If it returns false, we reject the request immediately – no API call, no bill. We initially tried just logging a warning and letting it through. Bad idea.

Pattern ② + ④ → Persistent job store with atomic claiming

export interface PersistentJobStore {
  enqueue(job: Omit<PersistedJob, 'state' | 'attempts' | 'createdAt'>): Promise<void>;
  claim(jobId: string): Promise<PersistedJob | null>;
  complete(jobId: string): Promise<void>;
  fail(jobId: string, error: string): Promise<void>;
  recoverOrphans(timeoutMs: number): Promise<PersistedJob[]>;
}

export class RedisPersistentJobStore implements PersistentJobStore {
  async claim(jobId: string): Promise<PersistedJob | null> {
    const luaScript = `
      local data = redis.call('GET', KEYS[1])
      if not data then return nil end
      local job = cjson.decode(data)
      if job.state ~= 'pending' then return nil end
      job.state = 'processing'
      job.claimedAt = ARGV[1]
      redis.call('SET', KEYS[1], cjson.encode(job), 'EX', ARGV[2])
      return cjson.encode(job)
    `;
    // ... (full implementation in KeelStack)
  }
}

The Lua script ensures only one worker claims a given job – no double‑processing. The worker also periodically checks a circuit breaker; if the token budget for that job is exceeded, it calls fail() and cancels the LLM stream. We learned to add that after a stuck job ran for four hours.

Pattern ③ → Idempotency middleware

export function idempotencyMiddleware({ store, namespace }: IdempotencyMiddlewareOptions) {
  return async function (req, res, next) {
    if (!MUTATING_METHODS.has(req.method)) return next();

    const rawKey = req.headers[IDEMPOTENCY_HEADER];
    if (!rawKey || typeof rawKey !== 'string') return next();

    const storeKey = `${namespace}:${rawKey}`;
    const claimed = await store.tryClaimKey(storeKey, IDEMPOTENCY_TTL_SECONDS, {
      processedAt: new Date().toISOString(),
      requestId: req.headers['x-request-id'],
      source: namespace,
    });

    if (!claimed) {
      const record = await store.getRecord(storeKey);
      res.status(200).json({
        idempotent: true,
        processedAt: record?.processedAt,
        message: 'Request already processed. This is a replayed response.',
      });
      return;
    }

    try { await next(); }
    catch (err) { await store.releaseKey(storeKey); throw err; }
  };
}

Now retried webhooks or duplicate UI clicks won’t trigger a second LLM call – they receive the cached response. Honestly, we should have added this day one. It’s embarrassing how many duplicate charges we ate before we wised up.

3. Acknowledge the “wrapper” skepticism (and why guardrails matter)

Let’s be honest: the market is flooded with “AI wrappers.” Many are thin UI layers over an OpenAI key. That’s why experts roll their eyes.

This post is not about the wrapper. It’s about the guardrails.

Yes, this is more complex than a toy wrapper. That’s the point.

The complexity lives in the infrastructure:

Distributed job claiming via Lua scripts (because a race condition on pending jobs = double billing)
Persistence across restarts (lose your in‑memory budget counter? Congratulations, you just reset everyone’s limit)
Idempotency handling across retries, webhooks, and partial failures

These are hard problems. KeelStack solves them so you don’t have to – but the patterns themselves are what protect your bottom line.

4. DIY proxy vs. dedicated gateway – a risk‑appetite discussion

You could build all this yourself. Grab a Redis client, write a few middleware functions, glue them together. But consider the edge cases we ran into:

DIY challenge	Why it’s painful (we found out the hard way)
Atomic job claiming across 10 replicas	You’ll end up writing Lua anyway – or introducing race conditions. We had two workers process the same job once. Fun times.
Budget tracker surviving restarts	You need a persistent store (Redis/Postgres) and atomic increments. Our in‑memory version lost state on every deploy.
Circuit breakers for streaming responses	Handling token counting mid‑stream while a job may crash. We gave up and used a gateway.
Idempotency with variable TTLs	What if a request takes longer than your key TTL? Now duplicates leak through. We learned to set TTLs generously.

A dedicated gateway (like KeelStack’s LLMClient) bakes in these solutions. It’s not about avoiding work – it’s about avoiding the $1,000 mistake while you focus on your actual product.

Ready to stop bleeding tokens?

Explore the patterns, steal the code, or just grab the framework. But whatever you do, don’t ship another AI wrapper without circuit breakers.

👉 KeelStack – Budget‑aware LLM gateway

Why Your SaaS Node Backend Will Fail at 10k Requests/Minute (and How to Stress‑Proof It Without Rewriting)

Siddhant Jain — Sat, 28 Mar 2026 17:03:13 +0000

At 1k active users, your Node backend feels like a rock.

At 3k–5k users, Stripe webhooks start retrying, background jobs pile up, and you notice the first “duplicate charge” ticket.

At 8k–10k requests per minute, you’re in a live incident: jobs vanish on deploy, webhook duplicates double‑bill customers, and MFA state drifts, leaving users locked out.

Node is great—but naïve implementations won’t survive SaaS‑scale.

Here’s exactly what breaks and how to stress‑proof it without a full rewrite.

If you’re:

building a Node.js + TypeScript SaaS backend,
handling Stripe webhooks, background jobs, and auth,
and worried that your current architecture will fall apart at 3k–10k requests per minute,

then this post is for you.

What Actually Breaks at 10k RPM in Node

1. Silent Job Loss & Race Conditions

If your background jobs rely on setTimeout or an in‑memory array, a simple git push will wipe them out.

But the real pain starts when workers race for the same job.

Example: A Stripe checkout.session.completed event triggers a job to deliver a license.

Two workers both see the job as “pending” → both claim it → customer receives two licenses.

Pattern that fails:

// Naive in‑memory queue
const jobs = [];

setInterval(() => {
  const job = jobs.shift();
  if (job) process(job);
}, 1000);

What survives:

Persistent queue (Redis, RabbitMQ, Postgres with SKIP LOCKED).
Atomic claim: the first worker to “lock” the job wins; others skip it.
Crash recovery: jobs are persisted before execution, so a worker crash doesn’t lose them.

2. Stripe Webhook Race Conditions

Stripe retries slow webhooks. If your handler is not idempotent, each retry creates a new charge, subscription, or email.

Fragile handler:

app.post('/stripe-webhook', async (req, res) => {
  const event = req.body;
  await db.invoices.insert({ stripeId: event.id });
  await sendReceiptEmail();
  res.sendStatus(200);
});

If two identical events arrive concurrently, both will insert duplicate rows.

Idempotency fix:

Use a unique constraint on (stripe_event_id, event_type).
Or wrap the handler in an atomic guard that checks a “processed” flag before doing work.

3. Auth & MFA State Drift

When your authentication relies on in‑memory sessions or local cookies without server‑side validation, you risk:

Users being able to bypass MFA after a session token is stolen.
“MFA required” being enforced only in the UI, not on the API.

Example: A user enables MFA, but the API still allows them to change their billing email without a second factor. An attacker with a stolen session can compromise the account.

What’s needed:

Stateless tokens (JWT) with explicit permissions.
Per‑action MFA enforcement on sensitive routes (e.g., POST /api/billing/change-email), not just a flag in the UI.

How to Stress‑Test Your SaaS Node Backend

Before you hit 10k RPM, know where you’ll break. Here’s a simple stress‑test recipe you can run today:

Tools

autocannon or hey for HTTP load.
Stripe CLI to replay webhooks.
A script to kill workers randomly.

Tests to Run

Auth endpoint
autocannon -c 100 -p 10 http://localhost:3000/api/v1/auth/login
Watch for 5xx errors and 99th‑percentile latency. If you see spikes >1s, your session store might be the bottleneck.
Concurrent Stripe webhooks
Use Stripe CLI to fire 50 identical events simultaneously:
stripe trigger checkout.session.completed --repeat 50
Then check your DB for duplicate records. If you see any, your webhook handler isn’t idempotent.
Crash recovery
Start a long‑running job (e.g., 10s sleep).
While it’s running, kill the worker process (kill -9).
Verify the job is retried or resumed, not lost.

What to Measure

Error rate (should stay at 0%).
Job loss count (should be 0).
Duplicate transaction count (should be 0).

How KeelStack Already Hardens This

KeelStack Engine was built to survive exactly these failure modes on a production‑like SaaS workload. It ships with:

Atomic job queue using Redis‑Lua or PostgreSQL SKIP LOCKED. Jobs are persisted before execution; if a worker crashes, they’re re‑claimed by another worker with exponential backoff.
Idempotency guard for all mutating endpoints. Stripe webhooks are wrapped with a composite key (event_id + event_type), and the result is cached. Duplicate events return a 200 without re‑executing business logic. In stress‑tests with KeelStack, we see <1% error rate and zero duplicate transactions even when firing 100 identical Stripe webhooks per second.
Per‑action MFA enforcement at the API level. The auth module includes a requireMfaFor(route) helper that validates the MFA token on sensitive operations—not just on login.

These aren’t marketing claims; they’re the exact patterns you’d need to implement yourself. KeelStack ships them by default so you can focus on your unique product logic.

Practical Checklist: Hardening Your Node SaaS Before 10k RPM

Use persistent queues – Redis, RabbitMQ, or Postgres with SKIP LOCKED. Never rely on in‑memory arrays or setTimeout for jobs.
Idempotency keys on all webhooks and billing actions – store the result of every mutating operation keyed by a unique identifier (e.g., Stripe event ID + user ID).
Stateless sessions + per‑action MFA enforcement – store only a JWT; validate MFA on sensitive API endpoints, not just in the UI.
Crash‑safe job runners – jobs should be saved to the database before execution starts, and marked as done after success.
Stress‑test with 2–3x your expected peak – use autocannon and simulate webhook floods to catch race conditions early.
Add structured logging – correlate logs with request IDs so you can trace a job from creation to completion across worker restarts.
Enforce test coverage – write integration tests for failure scenarios (e.g., duplicate webhooks, worker crashes). If you can’t reproduce it in CI, it will happen in production.

For deep‑dives on each of these topics, check out our previous posts:

Ship Safe, Not Just Fast

If you’re building a SaaS backend in Node, you don’t have to rediscover these hard‑earned lessons at 3am when your first real‑world traffic spike hits. The patterns above are proven and can be integrated incrementally—or you can start from a foundation that already has them built in.

KeelStack Engine is a production‑tested Node + TypeScript starter that includes idempotency, persistent job queues, per‑user LLM token budgets, and a full auth/billing stack. It’s 100% source code you can access under license terms and deploy anywhere.

👉 Get instant access to KeelStack Engine – skip the weeks of wiring and jump straight to building features that matter.

The Silent Job Loss: Why Your Node.js SaaS Needs a Persistent Task Queue

Siddhant Jain — Sun, 22 Mar 2026 19:59:42 +0000

597 tests. 93.13% coverage. Here's what they protect.

A user pays. Your server receives the Stripe webhook. You fire off an async task to generate their report. Thirty seconds later you deploy a hotfix.

The report is never generated. The user is charged. Nobody gets an error. You find out three days later in a support ticket.

This is not a theoretical failure mode. It is the default behavior of every Node.js backend that queues work in memory.

Part 1: Memory Is Volatile

The most common pattern for async work in Node.js looks like this:

// User pays → webhook fires → kick off async work
webhookHandler(event) {
  // Fire and forget
  generateReport(event.userId, event.reportId);
  return res.status(200).json({ received: true });
}

async function generateReport(userId: string, reportId: string) {
  // This lives entirely in process memory
  const data = await fetchUserData(userId);
  const report = await callLLM(data);
  await saveReport(reportId, report);
}

This works perfectly in development. It fails silently in production for three reasons:

Deployments. Every deploy kills the running process. Any in-flight generateReport call dies mid-execution. No error is thrown anywhere visible. The job is gone.

Crashes. An unhandled exception or OOM kill takes every in-flight job with it. Same silent outcome.

Scaling. The moment you run two processes (two dynos, two containers), there is no coordination. A job kicked off in process A can only run in process A. Process B has no knowledge it exists.

The fix is not complicated in concept: write the job to durable storage before you start executing it. That way, if the process dies, the job survives. On restart, you find it and finish it.

The hard part is doing this correctly — specifically, making the claim step atomic so two workers cannot grab the same job at the same time.

Part 2: The Atomic Claim Problem

The naive approach to claiming a job looks like this:

-- Worker 1 and Worker 2 both run this simultaneously
UPDATE jobs
SET status = 'processing', claimed_at = NOW()
WHERE id = (
  SELECT id FROM jobs WHERE status = 'pending' LIMIT 1
)

At low load, this works. Under concurrency it doesn't. Two workers can both execute the subquery and get the same row before either has written processing. You get double-processing: the same report generated twice, the same email sent twice, the same billing event fired twice.

The standard fix in Postgres is FOR UPDATE SKIP LOCKED:

UPDATE jobs
SET status = 'processing', claimed_at = NOW()
WHERE id = (
  SELECT id FROM jobs
  WHERE status = 'pending'
  ORDER BY created_at ASC
  FOR UPDATE SKIP LOCKED
  LIMIT 1
)
RETURNING *

FOR UPDATE takes a row-level lock. SKIP LOCKED tells any other worker that hits a locked row to skip it rather than wait. The result: each worker atomically claims a different job. No deadlocks, no double-processing, regardless of how many workers are running.

KeelStack does not use Postgres for the job store (it runs without a database in zero-config mode). It uses Redis. But the same guarantee is needed, and Redis provides it through Lua scripts.

Here is the actual claim() implementation in RedisPersistentJobStore:

async claim(jobId: string): Promise<PersistedJob | null> {
  await this.connect();
  const luaScript = `
    local data = redis.call('GET', KEYS[1])
    if not data then return nil end
    local job = cjson.decode(data)
    if job.state ~= 'pending' then return nil end
    job.state = 'processing'
    job.claimedAt = ARGV[1]
    redis.call('SET', KEYS[1], cjson.encode(job), 'EX', ARGV[2])
    return cjson.encode(job)
  `;
  const result = await this.client.eval(
    luaScript, 1, this.key(jobId),
    new Date().toISOString(), String(JOB_TTL),
  ) as string | null;

  if (!result) return null;
  return JSON.parse(result) as PersistedJob;
}

The Lua script runs atomically inside Redis's single-threaded executor. Between the GET and the SET, nothing else can run. No other worker can see the job as pending and claim it. Exactly one caller gets the job back. Everyone else gets null.

The in-memory implementation (used in development and tests) gets the same guarantee for free because JavaScript's event loop is single-threaded:

async claim(jobId: string): Promise<PersistedJob | null> {
  const job = this.jobs.get(jobId);
  if (!job || job.state !== 'pending') return null;
  // Single-threaded JS: this read-modify-write is atomic within one process
  job.state = 'processing';
  job.claimedAt = new Date().toISOString();
  return { ...job };
}

The test that verifies this contract fires 20 concurrent claim attempts and asserts exactly one wins:

it('claim() concurrency — only one of N concurrent callers wins', async () => {
  await store.enqueue(makeJobInput());
  const results = await Promise.all(
    Array.from({ length: 20 }, () => store.claim('job-001')),
  );
  const winners = results.filter(Boolean);
  expect(winners).toHaveLength(1);
});

Part 3: Exponential Backoff and the Dead-Letter Log

Once a job is claimed, it runs. Sometimes the handler fails. The question is: what do you do next?

The worst answer is: retry immediately. If an LLM provider is rate-limiting you, hammering it again in the same second makes the situation worse for everyone. If your database just had a connection timeout, you want to give it time to recover. Retrying immediately into a recovering system causes the Thundering Herd problem: every waiting job piles in at once, overloading whatever just came back up.

The RetryableJobRunner uses exponential backoff with jitter:

function exponentialDelay(attempt: number, baseMs: number, maxMs: number): number {
  // Jitter: randomize ±20% to spread retries across instances
  const jitter = 1 + (Math.random() * 0.4 - 0.2);
  const delay = baseMs * Math.pow(2, attempt) * jitter;
  return Math.min(delay, maxMs);
}

With baseDelayMs: 250 and maxDelayMs: 30_000, the delays look like this:

Attempt	Base delay	With jitter (approx)
1	500ms	400–600ms
2	1,000ms	800ms–1.2s
3	2,000ms	1.6–2.4s
4	4,000ms	3.2–4.8s
5	8,000ms	6.4–9.6s
Cap	—	30s

The jitter is important. Without it, every worker that got rate-limited at the same moment retries at exactly the same time. With jitter, they spread out across a window, smoothing the load on whatever they are calling.

The non-retryable escape hatch. Not all errors deserve retries. If a user submits malformed data and your handler throws a validation error, retrying five times wastes four attempts and delays the dead-letter signal by minutes. The NonRetryableError class handles this:

export class NonRetryableError extends Error {
  constructor(message: string) {
    super(message);
    this.name = 'NonRetryableError';
  }
}

// In your handler:
if (!isValidPayload(job.payload)) {
  throw new NonRetryableError('Malformed report payload — check input schema');
}

When the runner catches a NonRetryableError, it skips the remaining attempts and goes straight to the dead-letter log:

if (error instanceof NonRetryableError || error.name === 'NonRetryableError') {
  this.logDeadLetter(job, attempt, error, 'non_retryable');
  throw error;
}

When maxAttempts is exhausted through normal retries, the same dead-letter path fires:

// All attempts exhausted — emit dead-letter signal
this.logDeadLetter(job, this.options.maxAttempts, lastError!, 'max_attempts_exceeded');

The dead-letter log output is structured JSON:

{
  "level": "error",
  "event": "job.dead_letter",
  "jobId": "report-abc-123",
  "jobName": "generate-report",
  "attempt": 5,
  "reason": "max_attempts_exceeded",
  "error": "LLM provider timeout after 30000ms"
}

Filter on event = 'job.dead_letter' in Datadog, CloudWatch, or any structured log sink to get immediate alerts when jobs exhaust their retries. This is how you find out about silent failures before users report them.

Part 4: The Crash Test

The full lifecycle claim → execute → crash → recover is tested in worker.crash.test.ts. Here is the core scenario:

it('Orphaned job (stuck in processing) — recoverOrphans should reset to pending', async () => {
  const jobId = `crash_job_${Math.random().toString(36).substring(7)}`;

  // 1. Enqueue the job
  await store.enqueue({
    id: jobId,
    name: 'billing-sync',
    event: 'billing.subscription.created',
    payload: { tenantId: 'tenant_crash_1' },
    maxAttempts: 3,
  });

  // 2. Worker claims it — job is now in 'processing' state
  const claimedJob = await store.claim(jobId);
  expect(claimedJob!.state).toBe('processing');

  // 3. Simulate the crash: worker never calls complete() or fail()
  //    Backdate claimedAt to make it look like the crash happened 61 seconds ago
  const internalStore = store as any;
  const j = internalStore.jobs.get(jobId);
  j.claimedAt = new Date(Date.now() - 61_000).toISOString();

  // 4. Recovery scan runs (as it would on the next server boot)
  const recovered = await store.recoverOrphans(60_000);

  // 5. Job is back in 'pending' — available to be claimed and finished
  expect(recovered.length).toBe(1);
  expect(recovered[0].state).toBe('pending');

  // 6. A new worker can now claim and complete it
  const reClaimed = await store.claim(jobId);
  expect(reClaimed).not.toBeNull();
});

The recovery mechanism is straightforward: on server startup (and optionally on a periodic tick), recoverOrphans(timeoutMs) scans for jobs that have been in processing state longer than timeoutMs. Any job older than that threshold is assumed to belong to a dead worker and is reset to pending, preserving the attempt count.

A separate test covers the edge case where a crashed job has already exhausted its retries. This one is important — without it, you could end up endlessly re-queuing jobs that will never succeed:

it('Orphaned job at max attempts — must go to failed (not pending) after recovery', async () => {
  // Simulate 3 failed attempts
  await store.fail(jobId, 'Attempt 1 failed');
  await store.fail(jobId, 'Attempt 2 failed');
  await store.fail(jobId, 'Attempt 3 failed — maxAttempts reached');

  // Even if it ends up orphaned in 'processing' state...
  j.state = 'processing';
  j.claimedAt = new Date(Date.now() - 61_000).toISOString();

  const recovered = await store.recoverOrphans(60_000);

  // ...recovery must NOT re-queue it. It should stay 'failed'.
  expect(recovered.length).toBe(0);
  expect(finalState?.state).toBe('failed');
});

The full runner is also tested at the integration level with a simulated crash mid-execution:

it('RetryableJobRunner: crash on attempt 1, recover on attempt 2', async () => {
  let attempts = 0;

  const handler = vi.fn(async () => {
    attempts++;
    if (attempts <= 1) {
      throw new Error(`Simulated worker crash on attempt ${attempts}`);
    }
    return { ok: true as const };
  });

  const runner = new RetryableJobRunner(handler, {
    maxAttempts: 3,
    baseDelayMs: 1,
    maxDelayMs: 10,
  });

  await expect(runner.run(job)).resolves.toBeUndefined();
  expect(attempts).toBe(2); // 1 crash + 1 success
});

What This Protects In Practice

The silent job loss scenario from the top of this post is exactly what these components prevent:

User pays → webhook fires → generateReport is enqueued to PersistentJobStore before any async work starts
Job is persisted in Redis (or in-memory in development) with state pending
Worker claims it atomically — Lua script ensures only one worker gets it
Deploy happens mid-execution → process dies → job stays in processing
New process starts → recoverOrphans runs → job is reset to pending with attempt count intact
Worker claims it again → report is generated → job moves to completed

The user gets their report. You never know there was a crash. That is the point.

KeelStack Engine ships RetryableJobRunner, PersistentJobStore, and the crash recovery mechanism as part of the Layer 06 background job system. Zero configuration required — it runs with in-memory fallbacks locally and switches to Redis automatically when REDIS_URL is set.

Get KeelStack Engine →

Why Your "Vibe Coded" SaaS Will Fail at 100 Users (and How to Fix It)

Siddhant Jain — Sat, 21 Mar 2026 11:03:00 +0000

It's 2026. You just built a functional SaaS MVP in four hours using Cursor and Claude.
It looks great, the happy path works, and you're ready to tweet your launch.

But there's a hidden tax on AI-generated code: Architectural Debt.

When you vibe-code without a strict foundation, the LLM takes the path of least
resistance. It puts database logic in your routes, skips error handling, and ignores
race conditions. It builds a prototype, not a product.

This isn't a skill problem. It's a structural problem. And it only shows up at scale.

The "Vibe Coding" Trap

Most developers hit their first wall not at launch — but at 100 users.

That's when:

Two users double-click "Subscribe" at the same time.
Stripe retries a slow webhook and hits your server twice.
A background job fails silently, and the user never gets their report.
One user with an AI feature loops a prompt and burns $200 of your OpenAI credits in 20 minutes.

None of these show up in development. None of them show up in your happy-path tests.
They show up in production, at 2am, when you're not watching.

The fix isn't "write better prompts." The fix is building on a foundation that makes
these failure modes structurally impossible.

1. The Race Condition That Kills Conversions

Most AI-generated Stripe integrations look like this:

1. Receive webhook.
2. Check if processed = true in DB.
3. If not, provision the license.

This is broken.

Stripe retries webhooks. If two requests hit your server at the same millisecond —
which happens regularly under real load — both will see processed = false, and
you'll double-provision (or double-charge) the user.

This isn't hypothetical. It's a confirmed race condition pattern that shows up in
production at real-world webhook retry rates.

The Fix: Atomic Idempotency

The correct approach is not "check then set." It's atomic SET NX (Set if Not Exists).

In Redis, this means:

// WRONG — race condition between check and set
const isProcessed = await store.isProcessed(eventId);
if (!isProcessed) {
  await store.markProcessed(eventId);
  await provisionLicense();
}

// CORRECT — atomic, no race condition
const claimed = await store.tryClaimKey(eventId);
if (claimed) {
  await provisionLicense();
}

The difference: tryClaimKey() is a single atomic Redis SET NX operation.
Either you claim it or you don't. There is no window between the check and the claim.

In KeelStack Engine, every webhook handler uses webhookDeduplicationGuard
middleware which wraps tryClaimKey() automatically:

router.post(
  '/webhooks/stripe',
  webhookDeduplicationGuard(idempotencyStore, 'stripe'),
  stripeWebhookHandler,
)

Pro tip: If your backend doesn't use an Idempotency-Key header for mutating
requests, you are not production-ready.

2. Why "Spaghetti Prompts" Break Your Architecture

As your project grows, your AI context window gets cluttered. With a flat file structure,
the AI starts hallucinating. It forgets where your auth logic lives, starts inventing new
ways to call your database, and quietly breaks layer boundaries you thought were stable.

This isn't a Cursor or Claude problem. It's a map problem.

AI agents write better code when they have clear, enforced boundaries. Without them,
they wander.

The Fix: The 8-Layer "Constitution"

KeelStack Engine uses a strict Hexagonal (Ports & Adapters) architecture across
eight explicit layers:

Layer	Purpose	AI Write?
01-Core	Security, errors, middleware, guards	❌ NO
02-Common	DTOs, types, utilities	✅ YES
03-Policies	Business rules, billing gates, access guards	❌ NO
04-Modules	Feature modules: auth, billing, users, tasks	✅ YES
05-Infra	DB schema, Stripe/Redis/Resend gateways	❌ NO
06-Background	Worker pool, retry-safe job runner, event bus	✅ YES
07-AI	LLMClient, cost controls, AI boundary rules	❌ NO
08-Web	Express routes, OpenAPI spec	✅ YES

The .cursorrules file enforces these boundaries at the Cursor / Claude level:

AI can write to 02-Common, 04-Modules, 06-Background, 08-Web.
AI cannot touch 01-Core, 03-Policies, 05-Infra/schema.ts, or 07-AI/LLMClient.ts.

The result: your AI agent writes architecture-compliant code the first time, without
you needing to explain the layer rules in every prompt.

This .cursorrules file is free and open source on GitHub. Drop it in any
Node.js project root and Cursor loads it automatically.

3. The $500 AI Loop

You've seen the horror stories. A developer leaves an AI agent running, a loop occurs,
and they wake up to a $500 OpenAI bill. One user finds a way to trigger your AI feature
in a loop, and your margins disappear by end of day.

If you're building an AI SaaS, you cannot rely on the AI to behave. You need
hard governance at the infrastructure level.

The Fix: Centralized LLM Client with Hard Budget Caps

Every LLM call in KeelStack Engine goes through a single llmClient singleton
in src/07-AI/llm/LLMClient.ts. No exceptions.

This client enforces:

Per-user token budgets — hard caps on what a single user can spend per hour, per day, or per feature.
Cost attribution — every call includes a feature field so you know exactly which part of your product is eating your margin.
Automatic retry on 429/503 — rate limit errors don't crash your app; they backoff and retry.
Request timeouts — runaway prompts are killed after a configurable threshold.

const response = await llmClient.complete({
  userId: 'usr_123',
  feature: 'report_gen',        // cost attribution
  systemPrompt: 'You are...',
  userMessage: userInput,
  // budget, timeout, retry — all enforced automatically
})

One user cannot burn your monthly budget in an afternoon. It's structurally prevented.

4. The Background Job That Vanishes

AI-generated background job implementations typically look like this:

setTimeout(async () => {
  await processReport(jobId);
}, 0);

This is not a background job. This is a deferred function call with no retry,
no timeout, no logging, and no recovery.

If your server restarts, the job disappears. If processReport() throws, the user
never gets their result and you never find out why.

The Fix: Retry-Safe Job Runner with Dead-Letter Logging

KeelStack Engine uses real Node.js worker_threads — not setTimeout, not
setImmediate — with a RetryableJobRunner that provides:

Exponential backoff with jitter — failed jobs retry at increasing intervals, not all at once.
Per-attempt timeouts — a stuck job doesn't block the worker thread forever.
Dead-letter logging — jobs that exhaust retries are logged with full context, not silently dropped.
NonRetryableError — for bad-input errors that should fail fast without burning retry budget.

const runner = new RetryableJobRunner(async (payload) => {
  if (!isValid(payload)) throw new NonRetryableError('Bad payload')
  await processReport(payload)
  return { ok: true }
}, { maxAttempts: 5, baseDelayMs: 500, timeoutMs: 30_000 })

The async pattern exposed to clients is 202 + poll — the canonical
production pattern for long-running operations:

POST /api/v1/tasks    → { status: "accepted", jobId: "...", pollUrl: "..." }
GET  /api/v1/tasks/:jobId → { status: "processing" | "done" | "failed", result }

5. The Auth Bug That Leaks User Data

AI-generated password comparison often looks like this:

if (storedHash === inputHash) {
  return user;
}

This is vulnerable to timing attacks. An attacker can measure the response
time of failed comparisons to enumerate valid usernames.

The correct approach is crypto.timingSafeEqual() — a constant-time comparison
that doesn't leak information through timing.

KeelStack Engine uses:

Argon2id password hashing (OWASP 2023 parameters: 65MB memory, 3 iterations).
crypto.timingSafeEqual() for all password comparisons.
Brute-force lockout per IP on auth endpoints (30 req / 10 min).
Refresh token rotation — tokens are single-use and rotated on every refresh.
Transparent PBKDF2 → Argon2id migration on next login for any legacy hashes.

None of this is complicated to implement. It's just easy to skip when you're
prompting an AI to "add auth."

What 100 Users Actually Reveals

Here's the honest summary of what breaks at 100 users when you build on an
AI-generated flat foundation:

Failure Mode	Root Cause	Production Cost
Duplicate Stripe charges	No atomic idempotency on webhooks	Chargebacks, trust loss
Double-provisioned licenses	Race condition in check-then-set	Revenue leak
Jobs vanishing silently	`setTimeout` instead of real workers	User churn, support tickets
$500 AI bill overnight	No per-user LLM budget caps	Direct margin destruction
Auth timing leaks	String comparison instead of `timingSafeEqual`	Potential data breach
Architecture rot	Flat file structure, no layer boundaries	Weeks of refactoring debt

All of these are structurally preventable. None of them require more prompts.
They require a foundation that makes the wrong thing hard to build.

Stop Building Prototypes. Start Shipping Engines.

You can spend three weeks debugging AI-generated spaghetti after your first 100 users
expose every race condition and edge case. Or you can start with a foundation that
already handles them.

KeelStack Engine is not a template. It's a production-grade Node.js + TypeScript
environment designed specifically for the AI coding era:

563 unit tests · 37 e2e checks · 91.7% statement coverage, enforced by CI
Idempotency middleware, webhook deduplication guard, retry-safe job runner
Per-user LLM token budgets with cost attribution
Open-source .cursorrules — AI writes architecture-compliant code the first time
15 copy-paste prompts for Cursor, Claude, and Copilot
SaaS blueprints: AI Report Generator, Lead Finder API
One-time payment. Your source code, your rules.

Explore KeelStack Engine →

The Stripe webhook race condition that silently charged users twice (and the Node.js fix)

Siddhant Jain — Fri, 20 Mar 2026 13:04:38 +0000

Indie Hackers researchers traced a recurring support headache back to a single race condition inside Stripe webhook handling: simultaneous retries hit the same business transaction twice, and nobody noticed until customers complained about double charges. The fix looks obvious on paper, yet most teams still treat webhooks like regular requests.

What happened in the Indie Hackers post

Two things lined up: a webhook that triggered a downstream billing workflow and Stripe's stubborn automatic retries. When the original webhook handler takes longer than a few hundred milliseconds, Stripe retries the exact same event with the same id and idempotency_key. If the handler is not guarding against duplicate work, the second invocation commits the same payment record and triggers the customer's card again. By the time the developer examined the logs, support tickets had piled up and a single user had been billed twice for the same plan.

The key insight: the retries are legitimate, the payload is identical, and Stripe never marks the event "completed" until your webhook returns 200. So the safe answer is to process each event exactly once, even if Stripe delivers it a dozen times.

The "obviously wrong" pattern

Here's the simplified handler that almost every starter kit ships:

app.post("/api/stripe/webhook", async (req, res) => {
  const event = stripe.webhooks.constructEvent(req.body, sig, webhookSecret);

  if (event.type === "invoice.payment_succeeded") {
    const invoice = event.data.object;
    await db.payments.insert({
      invoiceId: invoice.id,
      amount: invoice.amount_paid,
      customerId: invoice.customer,
      processedAt: new Date(),
    });
    await queue.enqueue("deliver-license", { invoiceId: invoice.id });
  }

  res.status(200).send();
});

No idempotency, no locking, just another async route. If Stripe retries the same event, that insert runs again and a second charge is written. There's no shared cache or DB row that says "stop, I already handled this event."

KeelStack's atomic idempotency guard

KeelStack ships with a utility that wraps every webhook inside an atomic guard keyed on stripe_event_id + stripe_event_type. It touches one durable row in the database before any business work runs. The guard rejects duplicates in the same transaction, so you can safely acknowledge Stripe before any further work executes.

app.post("/api/stripe/webhook", async (req, res) => {
  const event = stripe.webhooks.constructEvent(req.body, sig, webhookSecret);

  await idempotency.guard(event.id, async (ctx) => {
    if (ctx.alreadyProcessed) return;

    await db.payments.insert({
      stripeId: event.id,
      amount: event.data.object.amount_paid,
      customerId: event.data.object.customer,
      processedAt: new Date(),
    });

    await jobQueue.enqueue("deliver-license", {
      invoiceId: event.data.object.id,
      customerId: event.data.object.customer,
    });
  });

  res.status(200).send();
});

The guard exposes ctx.alreadyProcessed, so duplicate deliveries short-circuit before they mutate the database or change customer state. Even under concurrent retries, the second handler hits the database conflict first and returns a clean 200 without touching the rest of the system.

Why this matters for your SaaS

Duplicate billing kills trust faster than any other incident.
Stripe's retries are not bugs — they are your backup plan. Treat them as a feature.
An idempotency guard like KeelStack's gives you a reproducible, auditable safeguard that you can test locally.

The Indie Hackers race condition is still the same bug we see in every project that treats webhooks as fire-and-forget. Wrap your handler in an atomic guard, store the Stripe event ID alongside your payment rows, and your ledger stays clean even when retries are furious.

Why I built a backend-only SaaS starter kit when everyone else builds full-stack

Siddhant Jain — Sun, 15 Mar 2026 05:00:41 +0000

Every SaaS starter kit I looked at came with a frontend attached.

ShipFast. MakerKit. SupaStarter. All of them assume you want
Next.js. All of them bundle a UI you may not need, a framework
you may not want, and opinions about your frontend you never asked for.

I didn't plan to build a backend-only kit. It just ended up that way.

How it started

I was starting a new project and needed a backend foundation.

Auth. Billing. Webhooks. Email. Database setup.
The same things every SaaS needs.

I needed a backend foundation. I knew what I wanted it to look like —
clean architecture, proper tests, everything wired.
So I built it. When it was done, I packaged it.

So I built it myself. And packaged it so others could use it too.

When I looked at what I'd built, it was pure backend. No pages.
No components. No frontend framework opinions. Just a clean
Node.js + TypeScript foundation with everything wired.

I didn't make it backend-only as a strategic decision.
It just was.

What I realized after

Once it was done, I started looking at who actually needs this.

Developers using Firebase or Supabase aren't the target. Those
tools are genuinely good for certain use cases. If you want
managed auth and don't care about owning your stack, use them.

But there's a specific type of developer who hits a wall with BaaS:

You're building something where you need full control over your auth logic
You already have a frontend — React, Vue, Svelte, React Native — and you don't want to rebuild it around a new framework
You want to own your database and deploy anywhere
You want Stripe webhooks handled properly, not worked around

That developer can't use ShipFast. It's Next.js-first.
That developer can't use MakerKit. Also Next.js-first.

They can use KeelStack.

What backend-only actually means

It means your frontend is your problem. KeelStack doesn't care
what you use. React, Vue, Svelte, React Native, a mobile app,
no frontend at all — it doesn't matter. You hit REST endpoints
over HTTP. That's it.

It also means the backend is done properly:

Hexagonal architecture — swap Stripe, swap your database, swap email providers without touching business logic
159 unit tests, 36 end-to-end checks, 95% statement coverage
Idempotent webhook handling — no duplicate processing
In-memory fallback — works without a database on first run
Health endpoints with per-service diagnostics

The uncomfortable truth about full-stack kits

When you buy a full-stack kit, you're buying someone else's
frontend opinions bundled with a backend you actually needed.

If those opinions match yours — great. You save time.

If they don't — you spend hours stripping out a frontend
you never wanted, or worse, you bend your product to fit
the kit's architecture.

Backend-only sidesteps this entirely.
You bring your frontend. I bring the backend.

Who this is for

Developers who already have a frontend and need a solid backend.
Freelancers building APIs for clients.
Founders launching SaaS MVPs who want to own their stack.
Developers switching away from Firebase who want full control.

Not for complete beginners. Not for people who want managed hosting.
Not for teams expecting enterprise SLAs.

One more thing

I'm 17. This is my first product.

I built it because what I needed didn't exist at a price
that made sense — and because building it taught me more
than any tutorial ever would.

It's $49. You own the code. No subscription.

If you're building a SaaS backend and you're tired of
framework lock-in — it might save you a few weeks.

keelstack.me

If you found this useful or have thoughts on the backend-only
approach — drop a comment. I'm genuinely curious whether other
developers hit the same wall.

How I Structured a Production-Ready Node.js Backend for SaaS

Siddhant Jain — Fri, 13 Mar 2026 13:54:39 +0000

Most SaaS projects start the same way.

You scaffold a Node.js backend, then gradually add authentication, billing, database models, email notifications, background jobs, and API documentation.

After doing this repeatedly, I wanted a cleaner starting point for new projects. So I structured a backend foundation with the most common SaaS components already wired together.

This is the architecture I ended up with.

Tech Stack

The backend is built with:

Node.js
TypeScript
Express
PostgreSQL
Drizzle ORM
Stripe (billing)
Resend (transactional email)
Zod (validation)
OpenAPI / Swagger (API docs)

The goal wasn't just to include these tools, but to organize them in a way that keeps business logic separate from infrastructure.

Folder Structure

This is the high-level folder structure used in the project:

The backend is organized into layers so business logic, infrastructure integrations, and the HTTP layer remain isolated.

API Documentation

The API is documented using OpenAPI, making it easy to explore and test endpoints during development.

Swagger UI exposes the available endpoints and request schemas for quick testing.

Example Request Flow

A typical API request flows like this:

Express route receives the request
Middleware applies rate limiting and authentication
Controller calls the relevant module
Module applies policy rules
Infrastructure adapters interact with the database or external APIs
Response is returned to the client

Keeping integrations behind adapters makes them easier to replace later.

Built-In SaaS Components

The backend includes several common pieces needed for SaaS applications:

Authentication and user management
Stripe billing with webhook handling
PostgreSQL database setup
Transactional email support
API documentation via OpenAPI
Structured error handling
Rate limiting and security middleware
End-to-end tests

These are things many SaaS projects end up implementing anyway.

Demo

I recorded a short demo showing the server startup and readiness checks.

You can see it here:

KeelStack.me

Final Thoughts

The goal was to create a backend structure that is easy to extend without mixing business logic with infrastructure code.

I'm curious how others structure Node.js backends for SaaS products.

Do you prefer layered architectures like this, or something simpler?