<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Siddhant Jain</title>
    <description>The latest articles on Forem by Siddhant Jain (@siddhant_jain_18).</description>
    <link>https://forem.com/siddhant_jain_18</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3791700%2Ff357e54e-3f91-4160-9e0c-9428b560cd0c.png</url>
      <title>Forem: Siddhant Jain</title>
      <link>https://forem.com/siddhant_jain_18</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/siddhant_jain_18"/>
    <language>en</language>
    <item>
      <title>The “Token Bleed”: How to Operate LLMs Without Bankrupting Yourself</title>
      <dc:creator>Siddhant Jain</dc:creator>
      <pubDate>Fri, 03 Apr 2026 18:06:11 +0000</pubDate>
      <link>https://forem.com/siddhant_jain_18/the-token-bleed-how-to-operate-llms-without-bankrupting-yourself-2p1o</link>
      <guid>https://forem.com/siddhant_jain_18/the-token-bleed-how-to-operate-llms-without-bankrupting-yourself-2p1o</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Experts across infra, SRE, and product‑engineering circles don’t have one single “rulebook,” but the consensus from real‑world write‑ups and discussions is clear: &lt;strong&gt;if you’re building an “AI wrapper” or LLM‑based product, the way you succeed (and avoid backlash) is by focusing on the &lt;em&gt;hard infrastructure and reliability problems&lt;/em&gt;, not just the UI or “vibe.”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We learned this the hard way. In one project we ran, we watched a single runaway agent hit six figures in tokens before the dashboard even refreshed. Another time, we tried in‑memory counters for budgets – after a restart, everyone’s limit was reset and we started overbilling users. Oops.&lt;/p&gt;

&lt;p&gt;A single bug or malicious user can still drain $1,000 of OpenAI credits in an hour. But the fix isn’t a “better wrapper” – it’s &lt;strong&gt;LLM operations&lt;/strong&gt;: treating the model like any other expensive, unreliable external service (Stripe, S3, Kafka). Let’s walk through the patterns that protect your wallet, then see one concrete implementation.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The principles (code‑agnostic, works for any LLM)
&lt;/h2&gt;

&lt;p&gt;Before we touch code, internalise these four guardrails. They apply whether you’re using OpenAI, Anthropic, Llama, or a mix.&lt;/p&gt;

&lt;h3&gt;
  
  
  ① Per‑user / per‑org token budgets (with rolling windows)
&lt;/h3&gt;

&lt;p&gt;Every token‑consuming request should be associated with a budget context. We found it safer to enforce &lt;strong&gt;hourly or daily limits&lt;/strong&gt; that persist across restarts – an in‑memory counter that resets when your process dies is useless. (Yes, we learned that one the expensive way.)&lt;/p&gt;

&lt;h3&gt;
  
  
  ② Per‑job circuit breakers
&lt;/h3&gt;

&lt;p&gt;Long‑running AI tasks (summaries, batch inference) can loop or stall. You need a way to kill a job &lt;strong&gt;mid‑stream&lt;/strong&gt; when it exceeds a cost or time threshold. That requires persistent job state: the worker must periodically check if it’s still allowed to continue.&lt;/p&gt;

&lt;h3&gt;
  
  
  ③ Idempotency for every mutating request
&lt;/h3&gt;

&lt;p&gt;Retries, webhooks, and double‑clicks are silent budget killers. Every request that calls an LLM should carry an idempotency key. The first request processes; duplicates receive a cached response – no extra tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  ④ Crash‑recoverable job queues
&lt;/h3&gt;

&lt;p&gt;If a worker dies while an LLM call is streaming, you risk orphaned billing. Jobs must be stored in a durable queue (Redis, Postgres) with atomic claiming and a recovery mechanism for “processing” jobs that exceed a timeout.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What this costs in real life&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
A single runaway loop generating 10M tokens at GPT‑4o rates (~$2.50/1M input, $10/1M output) can burn &lt;strong&gt;$100+ in minutes&lt;/strong&gt;. Without these four patterns, you’re exposed. (And yes, that’s a cheap model – imagine Anthropic Claude.)&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  2. Turning patterns into code (using KeelStack as one example)
&lt;/h2&gt;

&lt;p&gt;Wrappers are easy; guardrails are hard. This is more complex than a toy wrapper. That’s the point.&lt;/p&gt;

&lt;p&gt;The code below comes from &lt;strong&gt;KeelStack&lt;/strong&gt; – an open‑source framework that ships with a budget‑aware LLM gateway out of the box. But the &lt;em&gt;patterns&lt;/em&gt; are universal; you could implement them yourself (if you enjoy debugging distributed state at 2am).&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern ① → &lt;code&gt;TokenBudgetTracker&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Per-user hourly token budget.&lt;/span&gt;
&lt;span class="c1"&gt;// Tracked in DB — survives restarts, enforced globally.&lt;/span&gt;
&lt;span class="c1"&gt;// Configurable per user tier or plan.&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TokenBudgetTracker&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;windowStart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;windowMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="nx"&gt;_600_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// 1 hour&lt;/span&gt;

  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;budgetPerWindow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

  &lt;span class="nf"&gt;canSpend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;estimatedTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;record&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;windowStart&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;windowMs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;estimatedTokens&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;budgetPerWindow&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tokensUsed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;windowStart&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;windowMs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;tokensUsed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;windowStart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;tokensUsed&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before calling the LLM, we call &lt;code&gt;canSpend()&lt;/code&gt;. If it returns false, &lt;strong&gt;we reject the request immediately&lt;/strong&gt; – no API call, no bill. We initially tried just logging a warning and letting it through. Bad idea.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern ② + ④ → Persistent job store with atomic claiming
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;PersistentJobStore&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Omit&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PersistedJob&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;state&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;attempts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;createdAt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nf"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PersistedJob&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nf"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nf"&gt;recoverOrphans&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;timeoutMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PersistedJob&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisPersistentJobStore&lt;/span&gt; &lt;span class="k"&gt;implements&lt;/span&gt; &lt;span class="nx"&gt;PersistentJobStore&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PersistedJob&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;luaScript&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
      local data = redis.call('GET', KEYS[1])
      if not data then return nil end
      local job = cjson.decode(data)
      if job.state ~= 'pending' then return nil end
      job.state = 'processing'
      job.claimedAt = ARGV[1]
      redis.call('SET', KEYS[1], cjson.encode(job), 'EX', ARGV[2])
      return cjson.encode(job)
    `&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="c1"&gt;// ... (full implementation in KeelStack)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Lua script ensures &lt;strong&gt;only one worker&lt;/strong&gt; claims a given job – no double‑processing. The worker also periodically checks a circuit breaker; if the token budget for that &lt;em&gt;job&lt;/em&gt; is exceeded, it calls &lt;code&gt;fail()&lt;/code&gt; and cancels the LLM stream. We learned to add that after a stuck job ran for four hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern ③ → Idempotency middleware
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;idempotencyMiddleware&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;namespace&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="nx"&gt;IdempotencyMiddlewareOptions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;function &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;MUTATING_METHODS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;method&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rawKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;IDEMPOTENCY_HEADER&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;rawKey&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;rawKey&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;storeKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;rawKey&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;claimed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tryClaimKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;storeKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;IDEMPOTENCY_TTL_SECONDS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;processedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;requestId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-request-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;claimed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getRecord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;storeKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;idempotent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;processedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;processedAt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Request already processed. This is a replayed response.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;releaseKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;storeKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now retried webhooks or duplicate UI clicks won’t trigger a second LLM call – they receive the cached response. Honestly, we should have added this day one. It’s embarrassing how many duplicate charges we ate before we wised up.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Acknowledge the “wrapper” skepticism (and why guardrails matter)
&lt;/h2&gt;

&lt;p&gt;Let’s be honest: the market is flooded with “AI wrappers.” Many are thin UI layers over an OpenAI key. That’s why experts roll their eyes.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;This post is not about the wrapper. It’s about the guardrails.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Yes, this is more complex than a toy wrapper. That’s the point.&lt;/p&gt;

&lt;p&gt;The complexity lives in the infrastructure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Distributed job claiming via Lua scripts (because a race condition on &lt;code&gt;pending&lt;/code&gt; jobs = double billing)&lt;/li&gt;
&lt;li&gt;Persistence across restarts (lose your in‑memory budget counter? Congratulations, you just reset everyone’s limit)&lt;/li&gt;
&lt;li&gt;Idempotency handling across retries, webhooks, and partial failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are &lt;strong&gt;hard problems&lt;/strong&gt;. KeelStack solves them so you don’t have to – but the patterns themselves are what protect your bottom line.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. DIY proxy vs. dedicated gateway – a risk‑appetite discussion
&lt;/h2&gt;

&lt;p&gt;You &lt;em&gt;could&lt;/em&gt; build all this yourself. Grab a Redis client, write a few middleware functions, glue them together. But consider the edge cases we ran into:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;DIY challenge&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Why it’s painful (we found out the hard way)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Atomic job claiming across 10 replicas&lt;/td&gt;
&lt;td&gt;You’ll end up writing Lua anyway – or introducing race conditions. We had two workers process the same job once. Fun times.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Budget tracker surviving restarts&lt;/td&gt;
&lt;td&gt;You need a persistent store (Redis/Postgres) and atomic increments. Our in‑memory version lost state on every deploy.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Circuit breakers for streaming responses&lt;/td&gt;
&lt;td&gt;Handling token counting mid‑stream while a job may crash. We gave up and used a gateway.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Idempotency with variable TTLs&lt;/td&gt;
&lt;td&gt;What if a request takes longer than your key TTL? Now duplicates leak through. We learned to set TTLs generously.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A dedicated gateway (like KeelStack’s LLMClient) bakes in these solutions. It’s not about avoiding work – it’s about &lt;strong&gt;avoiding the $1,000 mistake&lt;/strong&gt; while you focus on your actual product.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Ready to stop bleeding tokens?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Explore the patterns, steal the code, or just grab the framework. But whatever you do, &lt;strong&gt;don’t ship another AI wrapper without circuit breakers.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://keelstack.me" rel="noopener noreferrer"&gt;&lt;strong&gt;KeelStack – Budget‑aware LLM gateway&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>nodejs</category>
      <category>sre</category>
    </item>
    <item>
      <title>Why Your SaaS Node Backend Will Fail at 10k Requests/Minute (and How to Stress‑Proof It Without Rewriting)</title>
      <dc:creator>Siddhant Jain</dc:creator>
      <pubDate>Sat, 28 Mar 2026 17:03:13 +0000</pubDate>
      <link>https://forem.com/siddhant_jain_18/why-your-saas-node-backend-will-fail-at-10k-requestsminute-and-how-to-stress-proof-it-without-2bfg</link>
      <guid>https://forem.com/siddhant_jain_18/why-your-saas-node-backend-will-fail-at-10k-requestsminute-and-how-to-stress-proof-it-without-2bfg</guid>
      <description>&lt;p&gt;At 1k active users, your Node backend feels like a rock.&lt;br&gt;&lt;br&gt;
At 3k–5k users, Stripe webhooks start retrying, background jobs pile up, and you notice the first “duplicate charge” ticket.&lt;br&gt;&lt;br&gt;
At 8k–10k requests per minute, you’re in a live incident: jobs vanish on deploy, webhook duplicates double‑bill customers, and MFA state drifts, leaving users locked out.&lt;/p&gt;

&lt;p&gt;Node is great—but naïve implementations won’t survive SaaS‑scale.&lt;br&gt;&lt;br&gt;
Here’s exactly what breaks and how to stress‑proof it &lt;strong&gt;without a full rewrite&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you’re:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;building a Node.js + TypeScript SaaS backend,
&lt;/li&gt;
&lt;li&gt;handling Stripe webhooks, background jobs, and auth,
&lt;/li&gt;
&lt;li&gt;and worried that your current architecture will fall apart at 3k–10k requests per minute,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then this post is for you.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Actually Breaks at 10k RPM in Node
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Silent Job Loss &amp;amp; Race Conditions
&lt;/h3&gt;

&lt;p&gt;If your background jobs rely on &lt;code&gt;setTimeout&lt;/code&gt; or an in‑memory array, a simple &lt;code&gt;git push&lt;/code&gt; will wipe them out.&lt;br&gt;&lt;br&gt;
But the real pain starts when workers &lt;strong&gt;race&lt;/strong&gt; for the same job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;: A Stripe &lt;code&gt;checkout.session.completed&lt;/code&gt; event triggers a job to deliver a license.&lt;br&gt;&lt;br&gt;
Two workers both see the job as “pending” → both claim it → customer receives two licenses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern that fails&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Naive in‑memory queue&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jobs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

&lt;span class="nf"&gt;setInterval&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shift&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What survives&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Persistent queue&lt;/strong&gt; (Redis, RabbitMQ, Postgres with &lt;code&gt;SKIP LOCKED&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Atomic claim&lt;/strong&gt;: the first worker to “lock” the job wins; others skip it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crash recovery&lt;/strong&gt;: jobs are &lt;strong&gt;persisted before execution&lt;/strong&gt;, so a worker crash doesn’t lose them.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Stripe Webhook Race Conditions
&lt;/h3&gt;

&lt;p&gt;Stripe retries slow webhooks. If your handler is not idempotent, each retry creates a new charge, subscription, or email.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fragile handler&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/stripe-webhook&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;invoices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;stripeId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendReceiptEmail&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If two identical events arrive concurrently, both will insert duplicate rows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Idempotency fix&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a &lt;strong&gt;unique constraint&lt;/strong&gt; on &lt;code&gt;(stripe_event_id, event_type)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Or wrap the handler in an &lt;strong&gt;atomic guard&lt;/strong&gt; that checks a “processed” flag before doing work.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Auth &amp;amp; MFA State Drift
&lt;/h3&gt;

&lt;p&gt;When your authentication relies on in‑memory sessions or local cookies without server‑side validation, you risk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Users being able to bypass MFA after a session token is stolen.&lt;/li&gt;
&lt;li&gt;“MFA required” being enforced only in the UI, not on the API.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example&lt;/strong&gt;: A user enables MFA, but the API still allows them to change their billing email without a second factor. An attacker with a stolen session can compromise the account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What’s needed&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stateless tokens&lt;/strong&gt; (JWT) with explicit permissions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per‑action MFA enforcement&lt;/strong&gt; on sensitive routes (e.g., &lt;code&gt;POST /api/billing/change-email&lt;/code&gt;), not just a flag in the UI.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How to Stress‑Test Your SaaS Node Backend
&lt;/h2&gt;

&lt;p&gt;Before you hit 10k RPM, &lt;strong&gt;know where you’ll break&lt;/strong&gt;. Here’s a simple stress‑test recipe you can run today:&lt;/p&gt;

&lt;h3&gt;
  
  
  Tools
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/mcollina/autocannon" rel="noopener noreferrer"&gt;&lt;code&gt;autocannon&lt;/code&gt;&lt;/a&gt; or &lt;a href="https://github.com/rakyll/hey" rel="noopener noreferrer"&gt;&lt;code&gt;hey&lt;/code&gt;&lt;/a&gt; for HTTP load.&lt;/li&gt;
&lt;li&gt;Stripe CLI to replay webhooks.&lt;/li&gt;
&lt;li&gt;A script to kill workers randomly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tests to Run
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Auth endpoint&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;autocannon -c 100 -p 10 http://localhost:3000/api/v1/auth/login&lt;/code&gt;&lt;br&gt;
Watch for 5xx errors and 99th‑percentile latency. If you see spikes &amp;gt;1s, your session store might be the bottleneck.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Concurrent Stripe webhooks&lt;/strong&gt; &lt;br&gt;
Use Stripe CLI to fire 50 identical events simultaneously:&lt;br&gt;
&lt;code&gt;stripe trigger checkout.session.completed --repeat 50&lt;/code&gt;&lt;br&gt;
Then check your DB for duplicate records. If you see any, your webhook handler isn’t idempotent.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Crash recovery&lt;/strong&gt;&lt;br&gt;
Start a long‑running job (e.g., 10s sleep). &lt;br&gt;
While it’s running, kill the worker process (&lt;code&gt;kill -9&lt;/code&gt;).&lt;br&gt;
Verify the job is retried or resumed, not lost.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  What to Measure
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Error rate (should stay at 0%).&lt;/li&gt;
&lt;li&gt;Job loss count (should be 0).&lt;/li&gt;
&lt;li&gt;Duplicate transaction count (should be 0).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How KeelStack Already Hardens This
&lt;/h2&gt;

&lt;p&gt;KeelStack Engine was built to survive exactly these failure modes &lt;strong&gt;on a production‑like SaaS workload&lt;/strong&gt;. It ships with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Atomic job queue&lt;/strong&gt; using Redis‑Lua or PostgreSQL &lt;code&gt;SKIP LOCKED&lt;/code&gt;. Jobs are persisted before execution; if a worker crashes, they’re re‑claimed by another worker with exponential backoff.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency guard&lt;/strong&gt; for all mutating endpoints. Stripe webhooks are wrapped with a composite key (&lt;code&gt;event_id&lt;/code&gt; + &lt;code&gt;event_type&lt;/code&gt;), and the result is cached. Duplicate events return a 200 without re‑executing business logic. In stress‑tests with KeelStack, we see &amp;lt;1% error rate and zero duplicate transactions even when firing 100 identical Stripe webhooks per second.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per‑action MFA enforcement&lt;/strong&gt; at the API level. The auth module includes a &lt;code&gt;requireMfaFor(route)&lt;/code&gt; helper that validates the MFA token on sensitive operations—not just on login.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren’t marketing claims; they’re the exact patterns you’d need to implement yourself. KeelStack ships them by default so you can focus on your unique product logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Checklist: Hardening Your Node SaaS Before 10k RPM
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use persistent queues&lt;/strong&gt; – Redis, RabbitMQ, or Postgres with &lt;code&gt;SKIP LOCKED&lt;/code&gt;. Never rely on in‑memory arrays or &lt;code&gt;setTimeout&lt;/code&gt; for jobs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency keys on all webhooks and billing actions&lt;/strong&gt; – store the result of every mutating operation keyed by a unique identifier (e.g., Stripe event ID + user ID).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stateless sessions + per‑action MFA enforcement&lt;/strong&gt; – store only a JWT; validate MFA on sensitive API endpoints, not just in the UI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crash‑safe job runners&lt;/strong&gt; – jobs should be saved to the database &lt;strong&gt;before&lt;/strong&gt; execution starts, and marked as done &lt;strong&gt;after&lt;/strong&gt; success.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stress‑test with 2–3x your expected peak&lt;/strong&gt; – use &lt;code&gt;autocannon&lt;/code&gt; and simulate webhook floods to catch race conditions early.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add structured logging&lt;/strong&gt; – correlate logs with request IDs so you can trace a job from creation to completion across worker restarts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enforce test coverage&lt;/strong&gt; – write integration tests for failure scenarios (e.g., duplicate webhooks, worker crashes). If you can’t reproduce it in CI, it &lt;strong&gt;will&lt;/strong&gt; happen in production.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For deep‑dives on each of these topics, check out our previous posts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://keelstack.me/blog/silent-job-loss-nodejs-saas" rel="noopener noreferrer"&gt;The Silent Job Loss: Why Your Node.js SaaS Needs a Persistent Task Queue&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://keelstack.me/blog/vibe-coded-saas-fail-at-100-users" rel="noopener noreferrer"&gt;Why Your "Vibe Coded" SaaS Will Fail at 100 Users (and How to Fix It)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Ship Safe, Not Just Fast
&lt;/h2&gt;

&lt;p&gt;If you’re building a SaaS backend in Node, you don’t have to rediscover these hard‑earned lessons at 3am when your first real‑world traffic spike hits. The patterns above are proven and can be integrated incrementally—or you can start from a foundation that already has them built in.&lt;/p&gt;

&lt;p&gt;KeelStack Engine is a production‑tested Node + TypeScript starter that includes idempotency, persistent job queues, per‑user LLM token budgets, and a full auth/billing stack. It’s 100% source code you can access under license terms and deploy anywhere.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://keelstack.me" rel="noopener noreferrer"&gt;Get instant access to KeelStack Engine&lt;/a&gt; – skip the weeks of wiring and jump straight to building features that matter.&lt;/p&gt;

</description>
      <category>node</category>
      <category>saas</category>
      <category>backend</category>
      <category>security</category>
    </item>
    <item>
      <title>The Silent Job Loss: Why Your Node.js SaaS Needs a Persistent Task Queue</title>
      <dc:creator>Siddhant Jain</dc:creator>
      <pubDate>Sun, 22 Mar 2026 19:59:42 +0000</pubDate>
      <link>https://forem.com/siddhant_jain_18/the-silent-job-loss-why-your-nodejs-saas-needs-a-persistent-task-queue-5cih</link>
      <guid>https://forem.com/siddhant_jain_18/the-silent-job-loss-why-your-nodejs-saas-needs-a-persistent-task-queue-5cih</guid>
      <description>&lt;p&gt;&lt;em&gt;597 tests. 93.13% coverage. Here's what they protect.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;A user pays. Your server receives the Stripe webhook. You fire off an async task to generate their report. Thirty seconds later you deploy a hotfix.&lt;/p&gt;

&lt;p&gt;The report is never generated. The user is charged. Nobody gets an error. You find out three days later in a support ticket.&lt;/p&gt;

&lt;p&gt;This is not a theoretical failure mode. It is the default behavior of every Node.js backend that queues work in memory.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 1: Memory Is Volatile
&lt;/h2&gt;

&lt;p&gt;The most common pattern for async work in Node.js looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// User pays → webhook fires → kick off async work&lt;/span&gt;
&lt;span class="nf"&gt;webhookHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Fire and forget&lt;/span&gt;
  &lt;span class="nf"&gt;generateReport&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reportId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;received&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;generateReport&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reportId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// This lives entirely in process memory&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchUserData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callLLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveReport&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;reportId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;report&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works perfectly in development. It fails silently in production for three reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deployments.&lt;/strong&gt; Every deploy kills the running process. Any in-flight &lt;code&gt;generateReport&lt;/code&gt; call dies mid-execution. No error is thrown anywhere visible. The job is gone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Crashes.&lt;/strong&gt; An unhandled exception or OOM kill takes every in-flight job with it. Same silent outcome.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling.&lt;/strong&gt; The moment you run two processes (two dynos, two containers), there is no coordination. A job kicked off in process A can only run in process A. Process B has no knowledge it exists.&lt;/p&gt;

&lt;p&gt;The fix is not complicated in concept: write the job to durable storage &lt;em&gt;before&lt;/em&gt; you start executing it. That way, if the process dies, the job survives. On restart, you find it and finish it.&lt;/p&gt;

&lt;p&gt;The hard part is doing this correctly — specifically, making the claim step atomic so two workers cannot grab the same job at the same time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 2: The Atomic Claim Problem
&lt;/h2&gt;

&lt;p&gt;The naive approach to claiming a job looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Worker 1 and Worker 2 both run this simultaneously&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'processing'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;claimed_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At low load, this works. Under concurrency it doesn't. Two workers can both execute the subquery and get the same row before either has written &lt;code&gt;processing&lt;/code&gt;. You get double-processing: the same report generated twice, the same email sent twice, the same billing event fired twice.&lt;/p&gt;

&lt;p&gt;The standard fix in Postgres is &lt;code&gt;FOR UPDATE SKIP LOCKED&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'processing'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;claimed_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;jobs&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;
  &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;
  &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;SKIP&lt;/span&gt; &lt;span class="n"&gt;LOCKED&lt;/span&gt;
  &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;RETURNING&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;FOR UPDATE&lt;/code&gt; takes a row-level lock. &lt;code&gt;SKIP LOCKED&lt;/code&gt; tells any other worker that hits a locked row to skip it rather than wait. The result: each worker atomically claims a different job. No deadlocks, no double-processing, regardless of how many workers are running.&lt;/p&gt;

&lt;p&gt;KeelStack does not use Postgres for the job store (it runs without a database in zero-config mode). It uses Redis. But the same guarantee is needed, and Redis provides it through Lua scripts.&lt;/p&gt;

&lt;p&gt;Here is the actual &lt;code&gt;claim()&lt;/code&gt; implementation in &lt;code&gt;RedisPersistentJobStore&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PersistedJob&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;luaScript&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
    local data = redis.call('GET', KEYS[1])
    if not data then return nil end
    local job = cjson.decode(data)
    if job.state ~= 'pending' then return nil end
    job.state = 'processing'
    job.claimedAt = ARGV[1]
    redis.call('SET', KEYS[1], cjson.encode(job), 'EX', ARGV[2])
    return cjson.encode(job)
  `&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;luaScript&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JOB_TTL&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;PersistedJob&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Lua script runs atomically inside Redis's single-threaded executor. Between the &lt;code&gt;GET&lt;/code&gt; and the &lt;code&gt;SET&lt;/code&gt;, nothing else can run. No other worker can see the job as &lt;code&gt;pending&lt;/code&gt; and claim it. Exactly one caller gets the job back. Everyone else gets &lt;code&gt;null&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The in-memory implementation (used in development and tests) gets the same guarantee for free because JavaScript's event loop is single-threaded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PersistedJob&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// Single-threaded JS: this read-modify-write is atomic within one process&lt;/span&gt;
  &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;processing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;claimedAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The test that verifies this contract fires 20 concurrent claim attempts and asserts exactly one wins:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claim() concurrency — only one of N concurrent callers wins&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;makeJobInput&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;job-001&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;winners&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;winners&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveLength&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Part 3: Exponential Backoff and the Dead-Letter Log
&lt;/h2&gt;

&lt;p&gt;Once a job is claimed, it runs. Sometimes the handler fails. The question is: what do you do next?&lt;/p&gt;

&lt;p&gt;The worst answer is: retry immediately. If an LLM provider is rate-limiting you, hammering it again in the same second makes the situation worse for everyone. If your database just had a connection timeout, you want to give it time to recover. Retrying immediately into a recovering system causes the Thundering Herd problem: every waiting job piles in at once, overloading whatever just came back up.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;RetryableJobRunner&lt;/code&gt; uses exponential backoff with jitter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;exponentialDelay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;baseMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Jitter: randomize ±20% to spread retries across instances&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;baseMs&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;jitter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delay&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxMs&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;baseDelayMs: 250&lt;/code&gt; and &lt;code&gt;maxDelayMs: 30_000&lt;/code&gt;, the delays look like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Attempt&lt;/th&gt;
&lt;th&gt;Base delay&lt;/th&gt;
&lt;th&gt;With jitter (approx)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;500ms&lt;/td&gt;
&lt;td&gt;400–600ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1,000ms&lt;/td&gt;
&lt;td&gt;800ms–1.2s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2,000ms&lt;/td&gt;
&lt;td&gt;1.6–2.4s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4,000ms&lt;/td&gt;
&lt;td&gt;3.2–4.8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;8,000ms&lt;/td&gt;
&lt;td&gt;6.4–9.6s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cap&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;30s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The jitter is important. Without it, every worker that got rate-limited at the same moment retries at exactly the same time. With jitter, they spread out across a window, smoothing the load on whatever they are calling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The non-retryable escape hatch.&lt;/strong&gt; Not all errors deserve retries. If a user submits malformed data and your handler throws a validation error, retrying five times wastes four attempts and delays the dead-letter signal by minutes. The &lt;code&gt;NonRetryableError&lt;/code&gt; class handles this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NonRetryableError&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;NonRetryableError&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// In your handler:&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;isValidPayload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;NonRetryableError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Malformed report payload — check input schema&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the runner catches a &lt;code&gt;NonRetryableError&lt;/code&gt;, it skips the remaining attempts and goes straight to the dead-letter log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nx"&gt;NonRetryableError&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;NonRetryableError&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;logDeadLetter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;non_retryable&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;maxAttempts&lt;/code&gt; is exhausted through normal retries, the same dead-letter path fires:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// All attempts exhausted — emit dead-letter signal&lt;/span&gt;
&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;logDeadLetter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxAttempts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lastError&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;max_attempts_exceeded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dead-letter log output is structured JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"job.dead_letter"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jobId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"report-abc-123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jobName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"generate-report"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"attempt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"max_attempts_exceeded"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"LLM provider timeout after 30000ms"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Filter on &lt;code&gt;event = 'job.dead_letter'&lt;/code&gt; in Datadog, CloudWatch, or any structured log sink to get immediate alerts when jobs exhaust their retries. This is how you find out about silent failures before users report them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 4: The Crash Test
&lt;/h2&gt;

&lt;p&gt;The full lifecycle claim → execute → crash → recover is tested in &lt;code&gt;worker.crash.test.ts&lt;/code&gt;. Here is the core scenario:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Orphaned job (stuck in processing) — recoverOrphans should reset to pending&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jobId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`crash_job_&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// 1. Enqueue the job&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;billing-sync&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;billing.subscription.created&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tenantId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tenant_crash_1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;maxAttempts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// 2. Worker claims it — job is now in 'processing' state&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;claimedJob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;claimedJob&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;processing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// 3. Simulate the crash: worker never calls complete() or fail()&lt;/span&gt;
  &lt;span class="c1"&gt;//    Backdate claimedAt to make it look like the crash happened 61 seconds ago&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;internalStore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;internalStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;jobs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;claimedAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;61&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// 4. Recovery scan runs (as it would on the next server boot)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;recovered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recoverOrphans&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// 5. Job is back in 'pending' — available to be claimed and finished&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;recovered&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;recovered&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// 6. A new worker can now claim and complete it&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reClaimed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;claim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;reClaimed&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;not&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toBeNull&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The recovery mechanism is straightforward: on server startup (and optionally on a periodic tick), &lt;code&gt;recoverOrphans(timeoutMs)&lt;/code&gt; scans for jobs that have been in &lt;code&gt;processing&lt;/code&gt; state longer than &lt;code&gt;timeoutMs&lt;/code&gt;. Any job older than that threshold is assumed to belong to a dead worker and is reset to &lt;code&gt;pending&lt;/code&gt;, preserving the attempt count.&lt;/p&gt;

&lt;p&gt;A separate test covers the edge case where a crashed job has already exhausted its retries. This one is important — without it, you could end up endlessly re-queuing jobs that will never succeed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Orphaned job at max attempts — must go to failed (not pending) after recovery&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Simulate 3 failed attempts&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Attempt 1 failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Attempt 2 failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Attempt 3 failed — maxAttempts reached&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Even if it ends up orphaned in 'processing' state...&lt;/span&gt;
  &lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;processing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;claimedAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;61&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;recovered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recoverOrphans&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// ...recovery must NOT re-queue it. It should stay 'failed'.&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;recovered&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;finalState&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full runner is also tested at the integration level with a simulated crash mid-execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RetryableJobRunner: crash on attempt 1, recover on attempt 2&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;attempts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;attempts&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;attempts&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Simulated worker crash on attempt &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;attempts&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RetryableJobRunner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;maxAttempts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;baseDelayMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;maxDelayMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nx"&gt;resolves&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toBeUndefined&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;attempts&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 1 crash + 1 success&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What This Protects In Practice
&lt;/h2&gt;

&lt;p&gt;The silent job loss scenario from the top of this post is exactly what these components prevent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User pays&lt;/strong&gt; → webhook fires → &lt;code&gt;generateReport&lt;/code&gt; is enqueued to &lt;code&gt;PersistentJobStore&lt;/code&gt; &lt;em&gt;before&lt;/em&gt; any async work starts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Job is persisted&lt;/strong&gt; in Redis (or in-memory in development) with state &lt;code&gt;pending&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker claims it&lt;/strong&gt; atomically — Lua script ensures only one worker gets it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy happens mid-execution&lt;/strong&gt; → process dies → job stays in &lt;code&gt;processing&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New process starts&lt;/strong&gt; → &lt;code&gt;recoverOrphans&lt;/code&gt; runs → job is reset to &lt;code&gt;pending&lt;/code&gt; with attempt count intact&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker claims it again&lt;/strong&gt; → report is generated → job moves to &lt;code&gt;completed&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The user gets their report. You never know there was a crash. That is the point.&lt;/p&gt;




&lt;p&gt;KeelStack Engine ships &lt;code&gt;RetryableJobRunner&lt;/code&gt;, &lt;code&gt;PersistentJobStore&lt;/code&gt;, and the crash recovery mechanism as part of the Layer 06 background job system. Zero configuration required — it runs with in-memory fallbacks locally and switches to Redis automatically when &lt;code&gt;REDIS_URL&lt;/code&gt; is set.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://keelstack.me/" rel="noopener noreferrer"&gt;&lt;strong&gt;Get KeelStack Engine →&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>stripe</category>
      <category>saas</category>
      <category>node</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Why Your "Vibe Coded" SaaS Will Fail at 100 Users (and How to Fix It)</title>
      <dc:creator>Siddhant Jain</dc:creator>
      <pubDate>Sat, 21 Mar 2026 11:03:00 +0000</pubDate>
      <link>https://forem.com/siddhant_jain_18/why-your-vibe-coded-saas-will-fail-at-100-users-and-how-to-fix-it-4cff</link>
      <guid>https://forem.com/siddhant_jain_18/why-your-vibe-coded-saas-will-fail-at-100-users-and-how-to-fix-it-4cff</guid>
      <description>&lt;p&gt;It's 2026. You just built a functional SaaS MVP in four hours using Cursor and Claude.&lt;br&gt;
It looks great, the happy path works, and you're ready to tweet your launch.&lt;/p&gt;

&lt;p&gt;But there's a hidden tax on AI-generated code: &lt;strong&gt;Architectural Debt.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you vibe-code without a strict foundation, the LLM takes the path of least&lt;br&gt;
resistance. It puts database logic in your routes, skips error handling, and ignores&lt;br&gt;
race conditions. It builds a prototype, not a product.&lt;/p&gt;

&lt;p&gt;This isn't a skill problem. It's a &lt;strong&gt;structural problem.&lt;/strong&gt; And it only shows up at scale.&lt;/p&gt;


&lt;h2&gt;
  
  
  The "Vibe Coding" Trap
&lt;/h2&gt;

&lt;p&gt;Most developers hit their first wall not at launch — but at &lt;strong&gt;100 users.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Two users double-click "Subscribe" at the same time.&lt;/li&gt;
&lt;li&gt;Stripe retries a slow webhook and hits your server twice.&lt;/li&gt;
&lt;li&gt;A background job fails silently, and the user never gets their report.&lt;/li&gt;
&lt;li&gt;One user with an AI feature loops a prompt and burns $200 of your OpenAI credits in 20 minutes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these show up in development. None of them show up in your happy-path tests.&lt;br&gt;
They show up in production, at 2am, when you're not watching.&lt;/p&gt;

&lt;p&gt;The fix isn't "write better prompts." The fix is &lt;strong&gt;building on a foundation that makes&lt;br&gt;
these failure modes structurally impossible.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  1. The Race Condition That Kills Conversions
&lt;/h2&gt;

&lt;p&gt;Most AI-generated Stripe integrations look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Receive webhook.
2. Check if processed = true in DB.
3. If not, provision the license.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This is broken.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stripe retries webhooks. If two requests hit your server at the same millisecond —&lt;br&gt;
which happens regularly under real load — both will see &lt;code&gt;processed = false&lt;/code&gt;, and&lt;br&gt;
you'll double-provision (or double-charge) the user.&lt;/p&gt;

&lt;p&gt;This isn't hypothetical. It's a confirmed race condition pattern that shows up in&lt;br&gt;
production at real-world webhook retry rates.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix: Atomic Idempotency
&lt;/h3&gt;

&lt;p&gt;The correct approach is &lt;strong&gt;not "check then set."&lt;/strong&gt; It's &lt;strong&gt;atomic SET NX&lt;/strong&gt; (Set if Not Exists).&lt;/p&gt;

&lt;p&gt;In Redis, this means:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// WRONG — race condition between check and set&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isProcessed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isProcessed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;eventId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;isProcessed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;markProcessed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;eventId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;provisionLicense&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// CORRECT — atomic, no race condition&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;claimed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tryClaimKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;eventId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;claimed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;provisionLicense&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference: &lt;code&gt;tryClaimKey()&lt;/code&gt; is a single atomic Redis &lt;code&gt;SET NX&lt;/code&gt; operation.&lt;br&gt;
Either you claim it or you don't. There is no window between the check and the claim.&lt;/p&gt;

&lt;p&gt;In KeelStack Engine, every webhook handler uses &lt;code&gt;webhookDeduplicationGuard&lt;/code&gt;&lt;br&gt;
middleware which wraps &lt;code&gt;tryClaimKey()&lt;/code&gt; automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/webhooks/stripe&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nf"&gt;webhookDeduplicationGuard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;idempotencyStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;stripe&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="nx"&gt;stripeWebhookHandler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; If your backend doesn't use an &lt;code&gt;Idempotency-Key&lt;/code&gt; header for mutating&lt;br&gt;
requests, you are not production-ready.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. Why "Spaghetti Prompts" Break Your Architecture
&lt;/h2&gt;

&lt;p&gt;As your project grows, your AI context window gets cluttered. With a flat file structure,&lt;br&gt;
the AI starts hallucinating. It forgets where your auth logic lives, starts inventing new&lt;br&gt;
ways to call your database, and quietly breaks layer boundaries you thought were stable.&lt;/p&gt;

&lt;p&gt;This isn't a Cursor or Claude problem. It's a &lt;strong&gt;map problem.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI agents write better code when they have clear, enforced boundaries. Without them,&lt;br&gt;
they wander.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix: The 8-Layer "Constitution"
&lt;/h3&gt;

&lt;p&gt;KeelStack Engine uses a strict &lt;strong&gt;Hexagonal (Ports &amp;amp; Adapters) architecture&lt;/strong&gt; across&lt;br&gt;
eight explicit layers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;AI Write?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;01-Core&lt;/td&gt;
&lt;td&gt;Security, errors, middleware, guards&lt;/td&gt;
&lt;td&gt;❌ NO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;02-Common&lt;/td&gt;
&lt;td&gt;DTOs, types, utilities&lt;/td&gt;
&lt;td&gt;✅ YES&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;03-Policies&lt;/td&gt;
&lt;td&gt;Business rules, billing gates, access guards&lt;/td&gt;
&lt;td&gt;❌ NO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;04-Modules&lt;/td&gt;
&lt;td&gt;Feature modules: auth, billing, users, tasks&lt;/td&gt;
&lt;td&gt;✅ YES&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;05-Infra&lt;/td&gt;
&lt;td&gt;DB schema, Stripe/Redis/Resend gateways&lt;/td&gt;
&lt;td&gt;❌ NO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;06-Background&lt;/td&gt;
&lt;td&gt;Worker pool, retry-safe job runner, event bus&lt;/td&gt;
&lt;td&gt;✅ YES&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;07-AI&lt;/td&gt;
&lt;td&gt;LLMClient, cost controls, AI boundary rules&lt;/td&gt;
&lt;td&gt;❌ NO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;08-Web&lt;/td&gt;
&lt;td&gt;Express routes, OpenAPI spec&lt;/td&gt;
&lt;td&gt;✅ YES&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;.cursorrules&lt;/code&gt; file enforces these boundaries at the Cursor / Claude level:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI can write to &lt;code&gt;02-Common&lt;/code&gt;, &lt;code&gt;04-Modules&lt;/code&gt;, &lt;code&gt;06-Background&lt;/code&gt;, &lt;code&gt;08-Web&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;AI &lt;strong&gt;cannot touch&lt;/strong&gt; &lt;code&gt;01-Core&lt;/code&gt;, &lt;code&gt;03-Policies&lt;/code&gt;, &lt;code&gt;05-Infra/schema.ts&lt;/code&gt;, or &lt;code&gt;07-AI/LLMClient.ts&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result: your AI agent writes architecture-compliant code the first time, without&lt;br&gt;
you needing to explain the layer rules in every prompt.&lt;/p&gt;

&lt;p&gt;This &lt;code&gt;.cursorrules&lt;/code&gt; file is &lt;strong&gt;free and open source&lt;/strong&gt; on GitHub. Drop it in any&lt;br&gt;
Node.js project root and Cursor loads it automatically.&lt;/p&gt;


&lt;h2&gt;
  
  
  3. The $500 AI Loop
&lt;/h2&gt;

&lt;p&gt;You've seen the horror stories. A developer leaves an AI agent running, a loop occurs,&lt;br&gt;
and they wake up to a $500 OpenAI bill. One user finds a way to trigger your AI feature&lt;br&gt;
in a loop, and your margins disappear by end of day.&lt;/p&gt;

&lt;p&gt;If you're building an AI SaaS, &lt;strong&gt;you cannot rely on the AI to behave.&lt;/strong&gt; You need&lt;br&gt;
hard governance at the infrastructure level.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix: Centralized LLM Client with Hard Budget Caps
&lt;/h3&gt;

&lt;p&gt;Every LLM call in KeelStack Engine goes through a single &lt;code&gt;llmClient&lt;/code&gt; singleton&lt;br&gt;
in &lt;code&gt;src/07-AI/llm/LLMClient.ts&lt;/code&gt;. No exceptions.&lt;/p&gt;

&lt;p&gt;This client enforces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Per-user token budgets&lt;/strong&gt; — hard caps on what a single user can spend per hour,
per day, or per feature.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost attribution&lt;/strong&gt; — every call includes a &lt;code&gt;feature&lt;/code&gt; field so you know exactly
which part of your product is eating your margin.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic retry on 429/503&lt;/strong&gt; — rate limit errors don't crash your app; they
backoff and retry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request timeouts&lt;/strong&gt; — runaway prompts are killed after a configurable threshold.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;llmClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;usr_123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;feature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;report_gen&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// cost attribution&lt;/span&gt;
  &lt;span class="na"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;You are...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;userMessage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="c1"&gt;// budget, timeout, retry — all enforced automatically&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;One user cannot burn your monthly budget in an afternoon. It's structurally prevented.&lt;/p&gt;


&lt;h2&gt;
  
  
  4. The Background Job That Vanishes
&lt;/h2&gt;

&lt;p&gt;AI-generated background job implementations typically look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processReport&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This is not a background job.&lt;/strong&gt; This is a deferred function call with no retry,&lt;br&gt;
no timeout, no logging, and no recovery.&lt;/p&gt;

&lt;p&gt;If your server restarts, the job disappears. If &lt;code&gt;processReport()&lt;/code&gt; throws, the user&lt;br&gt;
never gets their result and you never find out why.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Fix: Retry-Safe Job Runner with Dead-Letter Logging
&lt;/h3&gt;

&lt;p&gt;KeelStack Engine uses real Node.js &lt;code&gt;worker_threads&lt;/code&gt; — not &lt;code&gt;setTimeout&lt;/code&gt;, not&lt;br&gt;
&lt;code&gt;setImmediate&lt;/code&gt; — with a &lt;code&gt;RetryableJobRunner&lt;/code&gt; that provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exponential backoff with jitter&lt;/strong&gt; — failed jobs retry at increasing intervals,
not all at once.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-attempt timeouts&lt;/strong&gt; — a stuck job doesn't block the worker thread forever.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dead-letter logging&lt;/strong&gt; — jobs that exhaust retries are logged with full context,
not silently dropped.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;NonRetryableError&lt;/code&gt;&lt;/strong&gt; — for bad-input errors that should fail fast without
burning retry budget.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RetryableJobRunner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;isValid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;NonRetryableError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Bad payload&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processReport&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;maxAttempts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;baseDelayMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;timeoutMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The async pattern exposed to clients is &lt;strong&gt;202 + poll&lt;/strong&gt; — the canonical&lt;br&gt;
production pattern for long-running operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /api/v1/tasks    → { status: "accepted", jobId: "...", pollUrl: "..." }
GET  /api/v1/tasks/:jobId → { status: "processing" | "done" | "failed", result }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. The Auth Bug That Leaks User Data
&lt;/h2&gt;

&lt;p&gt;AI-generated password comparison often looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;storedHash&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;inputHash&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This is vulnerable to timing attacks.&lt;/strong&gt; An attacker can measure the response&lt;br&gt;
time of failed comparisons to enumerate valid usernames.&lt;/p&gt;

&lt;p&gt;The correct approach is &lt;code&gt;crypto.timingSafeEqual()&lt;/code&gt; — a constant-time comparison&lt;br&gt;
that doesn't leak information through timing.&lt;/p&gt;

&lt;p&gt;KeelStack Engine uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Argon2id&lt;/strong&gt; password hashing (OWASP 2023 parameters: 65MB memory, 3 iterations).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;crypto.timingSafeEqual()&lt;/code&gt;&lt;/strong&gt; for all password comparisons.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brute-force lockout&lt;/strong&gt; per IP on auth endpoints (30 req / 10 min).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refresh token rotation&lt;/strong&gt; — tokens are single-use and rotated on every refresh.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparent PBKDF2 → Argon2id migration&lt;/strong&gt; on next login for any legacy hashes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is complicated to implement. It's just easy to skip when you're&lt;br&gt;
prompting an AI to "add auth."&lt;/p&gt;




&lt;h2&gt;
  
  
  What 100 Users Actually Reveals
&lt;/h2&gt;

&lt;p&gt;Here's the honest summary of what breaks at 100 users when you build on an&lt;br&gt;
AI-generated flat foundation:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure Mode&lt;/th&gt;
&lt;th&gt;Root Cause&lt;/th&gt;
&lt;th&gt;Production Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Duplicate Stripe charges&lt;/td&gt;
&lt;td&gt;No atomic idempotency on webhooks&lt;/td&gt;
&lt;td&gt;Chargebacks, trust loss&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Double-provisioned licenses&lt;/td&gt;
&lt;td&gt;Race condition in check-then-set&lt;/td&gt;
&lt;td&gt;Revenue leak&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jobs vanishing silently&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;setTimeout&lt;/code&gt; instead of real workers&lt;/td&gt;
&lt;td&gt;User churn, support tickets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;$500 AI bill overnight&lt;/td&gt;
&lt;td&gt;No per-user LLM budget caps&lt;/td&gt;
&lt;td&gt;Direct margin destruction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth timing leaks&lt;/td&gt;
&lt;td&gt;String comparison instead of &lt;code&gt;timingSafeEqual&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Potential data breach&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture rot&lt;/td&gt;
&lt;td&gt;Flat file structure, no layer boundaries&lt;/td&gt;
&lt;td&gt;Weeks of refactoring debt&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All of these are &lt;strong&gt;structurally preventable.&lt;/strong&gt; None of them require more prompts.&lt;br&gt;
They require a foundation that makes the wrong thing hard to build.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stop Building Prototypes. Start Shipping Engines.
&lt;/h2&gt;

&lt;p&gt;You can spend three weeks debugging AI-generated spaghetti after your first 100 users&lt;br&gt;
expose every race condition and edge case. Or you can start with a foundation that&lt;br&gt;
already handles them.&lt;/p&gt;

&lt;p&gt;KeelStack Engine is not a template. It's a &lt;strong&gt;production-grade Node.js + TypeScript&lt;br&gt;
environment designed specifically for the AI coding era:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;563 unit tests · 37 e2e checks · 91.7% statement coverage, enforced by CI&lt;/li&gt;
&lt;li&gt;Idempotency middleware, webhook deduplication guard, retry-safe job runner&lt;/li&gt;
&lt;li&gt;Per-user LLM token budgets with cost attribution&lt;/li&gt;
&lt;li&gt;Open-source &lt;code&gt;.cursorrules&lt;/code&gt; — AI writes architecture-compliant code the first time&lt;/li&gt;
&lt;li&gt;15 copy-paste prompts for Cursor, Claude, and Copilot&lt;/li&gt;
&lt;li&gt;SaaS blueprints: AI Report Generator, Lead Finder API&lt;/li&gt;
&lt;li&gt;One-time payment. Your source code, your rules.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://keelstack.me" rel="noopener noreferrer"&gt;Explore KeelStack Engine → &lt;/a&gt;&lt;/p&gt;

</description>
      <category>node</category>
      <category>saas</category>
      <category>cursor</category>
      <category>ai</category>
    </item>
    <item>
      <title>The Stripe webhook race condition that silently charged users twice (and the Node.js fix)</title>
      <dc:creator>Siddhant Jain</dc:creator>
      <pubDate>Fri, 20 Mar 2026 13:04:38 +0000</pubDate>
      <link>https://forem.com/siddhant_jain_18/the-stripe-webhook-race-condition-that-silently-charged-users-twice-and-the-nodejs-fix-36k5</link>
      <guid>https://forem.com/siddhant_jain_18/the-stripe-webhook-race-condition-that-silently-charged-users-twice-and-the-nodejs-fix-36k5</guid>
      <description>&lt;p&gt;Indie Hackers researchers traced a recurring support headache back to a single race condition inside Stripe webhook handling: simultaneous retries hit the same business transaction twice, and nobody noticed until customers complained about double charges. The fix looks obvious on paper, yet most teams still treat webhooks like regular requests.&lt;/p&gt;

&lt;h2&gt;
  
  
  What happened in the Indie Hackers post
&lt;/h2&gt;

&lt;p&gt;Two things lined up: a webhook that triggered a downstream billing workflow and Stripe's stubborn automatic retries. When the original webhook handler takes longer than a few hundred milliseconds, Stripe retries the exact same event with the same &lt;code&gt;id&lt;/code&gt; and &lt;code&gt;idempotency_key&lt;/code&gt;. If the handler is not guarding against duplicate work, the second invocation commits the same payment record and triggers the customer's card again. By the time the developer examined the logs, support tickets had piled up and a single user had been billed twice for the same plan.&lt;/p&gt;

&lt;p&gt;The key insight: the retries are legitimate, the payload is identical, and Stripe never marks the event "completed" until your webhook returns &lt;code&gt;200&lt;/code&gt;. So the safe answer is to process each event exactly once, even if Stripe delivers it a dozen times.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "obviously wrong" pattern
&lt;/h2&gt;

&lt;p&gt;Here's the simplified handler that almost every starter kit ships:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/api/stripe/webhook&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;stripe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webhooks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;constructEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;webhookSecret&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;invoice.payment_succeeded&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;invoice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;invoiceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invoice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invoice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;amount_paid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invoice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;processedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deliver-license&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;invoiceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;invoice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No idempotency, no locking, just another async route. If Stripe retries the same event, that &lt;code&gt;insert&lt;/code&gt; runs again and a second charge is written. There's no shared cache or DB row that says "stop, I already handled this event."&lt;/p&gt;

&lt;h2&gt;
  
  
  KeelStack's atomic idempotency guard
&lt;/h2&gt;

&lt;p&gt;KeelStack ships with a utility that wraps every webhook inside an atomic guard keyed on &lt;code&gt;stripe_event_id&lt;/code&gt; + &lt;code&gt;stripe_event_type&lt;/code&gt;. It touches one durable row in the database before any business work runs. The guard rejects duplicates in the same transaction, so you can safely acknowledge Stripe before any further work executes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/api/stripe/webhook&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;stripe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webhooks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;constructEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;webhookSecret&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;idempotency&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;alreadyProcessed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;stripeId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;amount_paid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;processedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;jobQueue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deliver-license&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;invoiceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The guard exposes &lt;code&gt;ctx.alreadyProcessed&lt;/code&gt;, so duplicate deliveries short-circuit before they mutate the database or change customer state. Even under concurrent retries, the second handler hits the database conflict first and returns a clean &lt;code&gt;200&lt;/code&gt; without touching the rest of the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for your SaaS
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Duplicate billing kills trust faster than any other incident.&lt;/li&gt;
&lt;li&gt;Stripe's retries are not bugs — they are your backup plan. Treat them as a feature.&lt;/li&gt;
&lt;li&gt;An idempotency guard like KeelStack's gives you a reproducible, auditable safeguard that you can test locally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Indie Hackers race condition is still the same bug we see in every project that treats webhooks as fire-and-forget. Wrap your handler in an atomic guard, store the Stripe event ID alongside your payment rows, and your ledger stays clean even when retries are furious.&lt;/p&gt;

</description>
      <category>node</category>
      <category>typescript</category>
      <category>stripe</category>
      <category>saas</category>
    </item>
    <item>
      <title>Why I built a backend-only SaaS starter kit when everyone else builds full-stack</title>
      <dc:creator>Siddhant Jain</dc:creator>
      <pubDate>Sun, 15 Mar 2026 05:00:41 +0000</pubDate>
      <link>https://forem.com/siddhant_jain_18/why-i-built-a-backend-only-saas-starter-kit-when-everyone-else-builds-full-stack-52n5</link>
      <guid>https://forem.com/siddhant_jain_18/why-i-built-a-backend-only-saas-starter-kit-when-everyone-else-builds-full-stack-52n5</guid>
      <description>&lt;p&gt;Every SaaS starter kit I looked at came with a frontend attached.&lt;/p&gt;

&lt;p&gt;ShipFast. MakerKit. SupaStarter. All of them assume you want &lt;br&gt;
Next.js. All of them bundle a UI you may not need, a framework &lt;br&gt;
you may not want, and opinions about your frontend you never asked for.&lt;/p&gt;

&lt;p&gt;I didn't plan to build a backend-only kit. It just ended up that way.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it started
&lt;/h2&gt;

&lt;p&gt;I was starting a new project and needed a backend foundation.&lt;/p&gt;

&lt;p&gt;Auth. Billing. Webhooks. Email. Database setup. &lt;br&gt;
The same things every SaaS needs.&lt;/p&gt;

&lt;p&gt;I needed a backend foundation. I knew what I wanted it to look like — &lt;br&gt;
clean architecture, proper tests, everything wired. &lt;br&gt;
So I built it. When it was done, I packaged it.&lt;/p&gt;

&lt;p&gt;So I built it myself. And packaged it so others could use it too.&lt;/p&gt;

&lt;p&gt;When I looked at what I'd built, it was pure backend. No pages. &lt;br&gt;
No components. No frontend framework opinions. Just a clean &lt;br&gt;
Node.js + TypeScript foundation with everything wired.&lt;/p&gt;

&lt;p&gt;I didn't make it backend-only as a strategic decision.&lt;br&gt;
It just was.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I realized after
&lt;/h2&gt;

&lt;p&gt;Once it was done, I started looking at who actually needs this.&lt;/p&gt;

&lt;p&gt;Developers using Firebase or Supabase aren't the target. Those &lt;br&gt;
tools are genuinely good for certain use cases. If you want &lt;br&gt;
managed auth and don't care about owning your stack, use them.&lt;/p&gt;

&lt;p&gt;But there's a specific type of developer who hits a wall with BaaS:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're building something where you need full control over 
your auth logic&lt;/li&gt;
&lt;li&gt;You already have a frontend — React, Vue, Svelte, React Native — 
and you don't want to rebuild it around a new framework&lt;/li&gt;
&lt;li&gt;You want to own your database and deploy anywhere&lt;/li&gt;
&lt;li&gt;You want Stripe webhooks handled properly, not worked around&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That developer can't use ShipFast. It's Next.js-first.&lt;br&gt;
That developer can't use MakerKit. Also Next.js-first.&lt;/p&gt;

&lt;p&gt;They can use KeelStack.&lt;/p&gt;

&lt;h2&gt;
  
  
  What backend-only actually means
&lt;/h2&gt;

&lt;p&gt;It means your frontend is your problem. KeelStack doesn't care &lt;br&gt;
what you use. React, Vue, Svelte, React Native, a mobile app, &lt;br&gt;
no frontend at all — it doesn't matter. You hit REST endpoints &lt;br&gt;
over HTTP. That's it.&lt;/p&gt;

&lt;p&gt;It also means the backend is done properly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hexagonal architecture — swap Stripe, swap your database, 
swap email providers without touching business logic&lt;/li&gt;
&lt;li&gt;159 unit tests, 36 end-to-end checks, 95% statement coverage&lt;/li&gt;
&lt;li&gt;Idempotent webhook handling — no duplicate processing&lt;/li&gt;
&lt;li&gt;In-memory fallback — works without a database on first run&lt;/li&gt;
&lt;li&gt;Health endpoints with per-service diagnostics&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The uncomfortable truth about full-stack kits
&lt;/h2&gt;

&lt;p&gt;When you buy a full-stack kit, you're buying someone else's &lt;br&gt;
frontend opinions bundled with a backend you actually needed.&lt;/p&gt;

&lt;p&gt;If those opinions match yours — great. You save time.&lt;/p&gt;

&lt;p&gt;If they don't — you spend hours stripping out a frontend &lt;br&gt;
you never wanted, or worse, you bend your product to fit &lt;br&gt;
the kit's architecture.&lt;/p&gt;

&lt;p&gt;Backend-only sidesteps this entirely.&lt;br&gt;
You bring your frontend. I bring the backend.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who this is for
&lt;/h2&gt;

&lt;p&gt;Developers who already have a frontend and need a solid backend.&lt;br&gt;
Freelancers building APIs for clients.&lt;br&gt;
Founders launching SaaS MVPs who want to own their stack.&lt;br&gt;
Developers switching away from Firebase who want full control.&lt;/p&gt;

&lt;p&gt;Not for complete beginners. Not for people who want managed hosting.&lt;br&gt;
Not for teams expecting enterprise SLAs.&lt;/p&gt;

&lt;h2&gt;
  
  
  One more thing
&lt;/h2&gt;

&lt;p&gt;I'm 17. This is my first product.&lt;/p&gt;

&lt;p&gt;I built it because what I needed didn't exist at a price &lt;br&gt;
that made sense — and because building it taught me more &lt;br&gt;
than any tutorial ever would.&lt;/p&gt;

&lt;p&gt;It's $49. You own the code. No subscription.&lt;/p&gt;

&lt;p&gt;If you're building a SaaS backend and you're tired of &lt;br&gt;
framework lock-in — it might save you a few weeks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://keelstack.me" rel="noopener noreferrer"&gt;keelstack.me&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you found this useful or have thoughts on the backend-only &lt;br&gt;
approach — drop a comment. I'm genuinely curious whether other &lt;br&gt;
developers hit the same wall.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>node</category>
      <category>typescript</category>
      <category>webdev</category>
      <category>beginners</category>
    </item>
    <item>
      <title>How I Structured a Production-Ready Node.js Backend for SaaS</title>
      <dc:creator>Siddhant Jain</dc:creator>
      <pubDate>Fri, 13 Mar 2026 13:54:39 +0000</pubDate>
      <link>https://forem.com/siddhant_jain_18/how-i-structured-a-production-ready-nodejs-backend-for-saas-248f</link>
      <guid>https://forem.com/siddhant_jain_18/how-i-structured-a-production-ready-nodejs-backend-for-saas-248f</guid>
      <description>&lt;p&gt;Most SaaS projects start the same way.&lt;/p&gt;

&lt;p&gt;You scaffold a Node.js backend, then gradually add authentication, billing, database models, email notifications, background jobs, and API documentation.&lt;/p&gt;

&lt;p&gt;After doing this repeatedly, I wanted a cleaner starting point for new projects. So I structured a backend foundation with the most common SaaS components already wired together.&lt;/p&gt;

&lt;p&gt;This is the architecture I ended up with.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;p&gt;The backend is built with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node.js&lt;/li&gt;
&lt;li&gt;TypeScript&lt;/li&gt;
&lt;li&gt;Express&lt;/li&gt;
&lt;li&gt;PostgreSQL&lt;/li&gt;
&lt;li&gt;Drizzle ORM&lt;/li&gt;
&lt;li&gt;Stripe (billing)&lt;/li&gt;
&lt;li&gt;Resend (transactional email)&lt;/li&gt;
&lt;li&gt;Zod (validation)&lt;/li&gt;
&lt;li&gt;OpenAPI / Swagger (API docs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal wasn't just to include these tools, but to organize them in a way that keeps business logic separate from infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Folder Structure
&lt;/h2&gt;

&lt;p&gt;This is the high-level folder structure used in the project:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgy9g05diveytpnpi5b5j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgy9g05diveytpnpi5b5j.png" alt=" " width="725" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The backend is organized into layers so business logic, infrastructure integrations, and the HTTP layer remain isolated.&lt;/p&gt;

&lt;h2&gt;
  
  
  API Documentation
&lt;/h2&gt;

&lt;p&gt;The API is documented using OpenAPI, making it easy to explore and test endpoints during development.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzonau7zgt9p7cb2f9z9n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzonau7zgt9p7cb2f9z9n.png" alt=" " width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Swagger UI exposes the available endpoints and request schemas for quick testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example Request Flow
&lt;/h2&gt;

&lt;p&gt;A typical API request flows like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Express route receives the request&lt;/li&gt;
&lt;li&gt;Middleware applies rate limiting and authentication&lt;/li&gt;
&lt;li&gt;Controller calls the relevant module&lt;/li&gt;
&lt;li&gt;Module applies policy rules&lt;/li&gt;
&lt;li&gt;Infrastructure adapters interact with the database or external APIs&lt;/li&gt;
&lt;li&gt;Response is returned to the client&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Keeping integrations behind adapters makes them easier to replace later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Built-In SaaS Components
&lt;/h2&gt;

&lt;p&gt;The backend includes several common pieces needed for SaaS applications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Authentication and user management&lt;/li&gt;
&lt;li&gt;Stripe billing with webhook handling&lt;/li&gt;
&lt;li&gt;PostgreSQL database setup&lt;/li&gt;
&lt;li&gt;Transactional email support&lt;/li&gt;
&lt;li&gt;API documentation via OpenAPI&lt;/li&gt;
&lt;li&gt;Structured error handling&lt;/li&gt;
&lt;li&gt;Rate limiting and security middleware&lt;/li&gt;
&lt;li&gt;End-to-end tests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are things many SaaS projects end up implementing anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;I recorded a short demo showing the server startup and readiness checks.&lt;/p&gt;

&lt;p&gt;You can see it here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://keelstack.me" rel="noopener noreferrer"&gt;KeelStack.me&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;The goal was to create a backend structure that is easy to extend without mixing business logic with infrastructure code.&lt;/p&gt;

&lt;p&gt;I'm curious how others structure Node.js backends for SaaS products.&lt;br&gt;&lt;br&gt;
Do you prefer layered architectures like this, or something simpler?&lt;/p&gt;

</description>
      <category>node</category>
      <category>typescript</category>
      <category>backend</category>
      <category>saas</category>
    </item>
  </channel>
</rss>
