<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Wences Martinez</title>
    <description>The latest articles on Forem by Wences Martinez (@wenchodev).</description>
    <link>https://forem.com/wenchodev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2480192%2F435df44d-b5fe-420c-b95d-38181fc0e28f.jpg</url>
      <title>Forem: Wences Martinez</title>
      <link>https://forem.com/wenchodev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/wenchodev"/>
    <language>en</language>
    <item>
      <title>NestJS in production: patterns worth stealing</title>
      <dc:creator>Wences Martinez</dc:creator>
      <pubDate>Sun, 17 May 2026 16:11:55 +0000</pubDate>
      <link>https://forem.com/wenchodev/nestjs-in-production-patterns-worth-stealing-1516</link>
      <guid>https://forem.com/wenchodev/nestjs-in-production-patterns-worth-stealing-1516</guid>
      <description>&lt;p&gt;Most NestJS tutorials stop right where things get interesting.&lt;/p&gt;

&lt;p&gt;You learn about modules, decorators, dependency injection, build a CRUD app, and then &lt;strong&gt;you ship something real&lt;/strong&gt; and realize nobody told you how to handle a webhook event without breaking your JSON parser, or why a switch statement for event types doesn't scale.&lt;/p&gt;

&lt;p&gt;This year I built two NestJS backend services at &lt;a href="https://resiz.es/" rel="noopener noreferrer"&gt;Resizes&lt;/a&gt;: an &lt;strong&gt;entitlements service&lt;/strong&gt; that enforces usage limits across plans, and a &lt;strong&gt;billing service&lt;/strong&gt; that handles the full Stripe subscription lifecycle. Both power the AI platform we built internally.&lt;/p&gt;

&lt;p&gt;After a few months running them &lt;strong&gt;in production&lt;/strong&gt;, here's what I actually learned. Not "what is dependency injection", you already know that (I hope so).&lt;/p&gt;

&lt;p&gt;These are the patterns I'd copy directly into any new NestJS project.&lt;/p&gt;




&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;How to handle webhook events without breaking your JSON endpoints.&lt;/li&gt;
&lt;li&gt;Why a factory beats a switch when you have many event types.&lt;/li&gt;
&lt;li&gt;Why a repository layer on top of Prisma ORM makes testing easier.&lt;/li&gt;
&lt;li&gt;The ValidationPipe options most tutorials skip.&lt;/li&gt;
&lt;li&gt;Named exceptions that make your codebase greppable.&lt;/li&gt;
&lt;li&gt;Why validating env vars at startup saves you a 3am incident.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The module is your unit of ownership
&lt;/h2&gt;

&lt;p&gt;A module in NestJS is not just a folder convention, &lt;strong&gt;it's a boundary&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In other words, a module should be something you could delete and the rest of the app wouldn't notice (until it needs to). &lt;strong&gt;What happens inside stays inside&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here's what that looks like in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;EntitlementsModule&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;BillingModule&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;TasksModule&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;HealthModule&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PrismaModule&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each one is a &lt;strong&gt;self-contained folder with its controller, service, repository, interfaces, DTOs and tests&lt;/strong&gt;. Nothing leaks out except what's explicitly exported, and the layer separation is strict: controllers route and validate, services hold business logic, repositories are the only place that touches Prisma.&lt;/p&gt;

&lt;p&gt;One thing to watch out for: &lt;strong&gt;module dependencies must be uni-directional&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;BillingModule&lt;/code&gt; imports from &lt;code&gt;EntitlementsModule&lt;/code&gt;, that's fine. But if &lt;code&gt;EntitlementsModule&lt;/code&gt; then also imports from &lt;code&gt;BillingModule&lt;/code&gt;, you've created a &lt;strong&gt;circular dependency&lt;/strong&gt; and NestJS will tell you at startup with a &lt;strong&gt;cryptic error&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Froc24q3hh6cvrvd6c7j1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Froc24q3hh6cvrvd6c7j1.png" alt="NestJS modules circular dependency error" width="800" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The fix is usually introducing a third, higher-level module that orchestrates both, or reconsidering whether that cross-dependency is pointing at a domain boundary problem.&lt;/p&gt;

&lt;p&gt;NestJS does have a &lt;code&gt;forwardRef()&lt;/code&gt; escape hatch for exactly this situation, but it's a patch, not a solution. Every time I've reached for it, the real answer was that I'd mixed two domains into what should have been separate modules.&lt;/p&gt;

&lt;p&gt;For &lt;code&gt;PrismaService&lt;/code&gt;, I went with &lt;code&gt;@Global()&lt;/code&gt; to avoid re-importing it everywhere:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// prisma/prisma.module.ts&lt;/span&gt;
&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Global&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;PrismaService&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;PrismaService&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PrismaModule&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the service handles its own lifecycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// prisma/prisma.service.ts&lt;/span&gt;
&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Injectable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PrismaService&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;PrismaClient&lt;/span&gt;
  &lt;span class="k"&gt;implements&lt;/span&gt; &lt;span class="nx"&gt;OnModuleInit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;OnModuleDestroy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;PrismaService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;onModuleInit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Database connected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;onModuleDestroy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$disconnect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Database disconnected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One instance, clean connect/disconnect. I've seen more than one codebase with connection leaks because nobody hooked into the module lifecycle.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stripe webhook pattern you should steal
&lt;/h2&gt;

&lt;p&gt;This is the most concrete thing I can give you.&lt;/p&gt;

&lt;p&gt;Webhooks from Stripe (or any payment provider) have a problem: to &lt;strong&gt;validate the signature&lt;/strong&gt; you need the raw body as a buffer, but NestJS's global JSON parser already consumed it by the time your controller runs.&lt;/p&gt;

&lt;p&gt;The naive fix is enabling raw body globally. &lt;strong&gt;Don't do that&lt;/strong&gt;. It breaks your regular JSON endpoints.&lt;/p&gt;

&lt;p&gt;The correct fix is three pieces that work together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Apply the raw body parser only on the webhook route, before NestJS starts:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// main.ts&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/stripe/webhooks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs before the global JSON middleware. Only that route gets a buffer body. Everything else keeps working normally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Validate the signature in a middleware:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// stripe-webhook.middleware.ts&lt;/span&gt;
&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Injectable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StripeWebhookMiddleware&lt;/span&gt; &lt;span class="k"&gt;implements&lt;/span&gt; &lt;span class="nx"&gt;NestMiddleware&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NextFunction&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;stripe-signature&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;stripeEvent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stripe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;webhooks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;constructEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;configService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;STRIPE_WEBHOOK_SECRET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`[StripeWebhook] Invalid signature: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;BadRequestException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Invalid webhook signature&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The middleware decorates the request with the parsed &lt;code&gt;stripeEvent&lt;/code&gt;. The controller never touches raw bytes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. In the controller, process and respond. Keep handlers fast:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// stripe.controller.ts&lt;/span&gt;
&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;webhooks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;handleWebhook&lt;/span&gt;&lt;span class="p"&gt;(@&lt;/span&gt;&lt;span class="nd"&gt;Req&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;stripeEvent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;Stripe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Event&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stripeService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handleEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;received&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stripe gives you up to &lt;strong&gt;30 seconds&lt;/strong&gt; before it marks the delivery as failed and retries. The &lt;code&gt;await&lt;/code&gt; is intentional: each individual handler is fast, just a database write and a status update, nothing close to that limit.&lt;/p&gt;

&lt;p&gt;If your handlers were &lt;strong&gt;doing slower work&lt;/strong&gt; (calling external APIs, sending emails through a third-party service), the right move is to &lt;strong&gt;push the event to a queue&lt;/strong&gt; like &lt;em&gt;BullMQ&lt;/em&gt; or a &lt;em&gt;Postgres&lt;/em&gt; outbox table, and &lt;strong&gt;respond 200 immediately&lt;/strong&gt;. At our scale, staying synchronous is simpler and predictable.&lt;/p&gt;

&lt;p&gt;One more thing: since NestJS v8 there's a cleaner alternative to the &lt;code&gt;app.use('/stripe/webhooks', raw(...))&lt;/code&gt; approach. You can pass &lt;code&gt;{ rawBody: true }&lt;/code&gt; to &lt;code&gt;NestFactory.create()&lt;/code&gt; and access &lt;code&gt;req.rawBody&lt;/code&gt; directly in the middleware, which keeps the Express-level setup out of your bootstrap file. Both work!&lt;/p&gt;




&lt;h2&gt;
  
  
  Event handler factory instead of a switch
&lt;/h2&gt;

&lt;p&gt;The billing service receives a dozen different Stripe event types, for example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;customer.subscription.created&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;customer.subscription.updated&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;invoice.paid&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first instinct is a switch statement. &lt;strong&gt;Don't do that&lt;/strong&gt;, there's a better way of handling this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// the antipattern&lt;/span&gt;
&lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;customer.subscription.created&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handleSubscriptionCreated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;break&lt;/span&gt;
  &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;customer.subscription.updated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handleSubscriptionUpdated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;break&lt;/span&gt;
  &lt;span class="c1"&gt;// ... 10 more cases&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a &lt;strong&gt;maintenance trap&lt;/strong&gt;. Every new event type means opening this file and adding a case. The class grows without bound and testing becomes painful.&lt;/p&gt;

&lt;p&gt;Instead, I built a &lt;strong&gt;factory&lt;/strong&gt; that resolves handlers by event type, all via &lt;em&gt;Nest's DI&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// stripe-event-handler.factory.ts&lt;/span&gt;
&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Injectable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StripeEventHandlerFactory&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;eventHandlers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;StripeEventHandler&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;

  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;subscriptionUpdatedHandler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CustomerSubscriptionUpdatedEventHandler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;subscriptionDeletedHandler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CustomerSubscriptionDeletedEventHandler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;invoicePaidHandler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;InvoicePaidEventHandler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;eventHandlers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;customer.subscription.updated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subscriptionUpdatedHandler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;customer.subscription.deleted&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subscriptionDeletedHandler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;invoice.paid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;invoicePaidHandler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;eventType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;StripeEventHandler&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;eventHandlers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;eventType&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;NotFoundException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`No handler for event: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;eventType&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each handler is its own class with a single &lt;code&gt;handle&lt;/code&gt; method. Adding a new event type is just a new class and one line in the map. No existing file changes. &lt;strong&gt;Open/Closed principle&lt;/strong&gt; in practice.&lt;/p&gt;

&lt;p&gt;The service call ends up clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stripeEventHandlerFactory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Repository pattern on top of Prisma
&lt;/h2&gt;

&lt;p&gt;I know this sounds like over-engineering when Prisma already gives you a clean API. The point isn't the abstraction itself, &lt;strong&gt;it's the isolation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When one of your services depends directly on Prisma, testing the service means setting up a real database or doing gymnastics with &lt;code&gt;jest.mock('prisma')&lt;/code&gt;. When your service depends on a &lt;strong&gt;repository layer&lt;/strong&gt;, you inject a mock in tests and the whole thing collapses into a flat unit test.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// entitlements.repository.ts&lt;/span&gt;
&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Injectable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EntitlementsRepository&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PrismaService&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;incrementUsage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;entitlementId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;entitlementUsage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;userId_entitlementId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;entitlementId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;currentUsage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The corresponding service test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// entitlements.service.spec.ts&lt;/span&gt;
&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;EntitlementsService&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;EntitlementsService&lt;/span&gt;

  &lt;span class="nf"&gt;beforeEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="kr"&gt;module&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createTestingModule&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nx"&gt;EntitlementsService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;provide&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;EntitlementsRepository&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;useValue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;incrementIfWithinLimit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;jest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;mockResolvedValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
          &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nx"&gt;service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kr"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;EntitlementsService&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;should increment usage when within limit&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;incrementUsage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user-1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TOKENS&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;// assert on the mock, not the database&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No database. No containers. &lt;strong&gt;Pure logic under test!&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Configure your ValidationPipe properly
&lt;/h2&gt;

&lt;p&gt;Most tutorials show only the basic &lt;a href="https://docs.nestjs.com/techniques/validation" rel="noopener noreferrer"&gt;ValidationPipe&lt;/a&gt; setup. These three options are the ones that &lt;strong&gt;actually matter in production&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;useGlobalPipes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ValidationPipe&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;whitelist&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;// strips unknown properties before they reach your service&lt;/span&gt;
  &lt;span class="na"&gt;forbidNonWhitelisted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// throws 400 if the client sends unknown fields&lt;/span&gt;
  &lt;span class="na"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;// auto-transforms payloads to DTO class instances&lt;/span&gt;
&lt;span class="p"&gt;}))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;whitelist&lt;/code&gt; is a silent safety net: if someone sends extra fields they shouldn't, they never reach your business logic. &lt;code&gt;forbidNonWhitelisted&lt;/code&gt; makes it explicit by rejecting the request outright. Both together are the right default for any API that doesn't accept open-ended payloads.&lt;/p&gt;




&lt;h2&gt;
  
  
  Exceptions that tell you something
&lt;/h2&gt;

&lt;p&gt;NestJS's built-in HTTP exceptions are good, but &lt;strong&gt;extending them is better&lt;/strong&gt;. &lt;br&gt;
Instead of throwing raw &lt;code&gt;HttpException&lt;/code&gt; everywhere, define named exception classes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// exceptions/entitlement-limit-exceeded.exception.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EntitlementLimitExceededException&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;ForbiddenException&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;`User &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; has reached the limit for entitlement: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ENTITLEMENT_LIMIT_EXCEEDED&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// exceptions/billing-user-not-found.exception.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BillingUserNotFoundException&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;NotFoundException&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Billing user not found: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;BILLING_USER_NOT_FOUND&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Usage in the service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;withinLimit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;EntitlementLimitExceededException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two reasons this is worth the extra file. First, the second argument to &lt;code&gt;super()&lt;/code&gt; overwrites the &lt;code&gt;error&lt;/code&gt; field in the response body. The response ends up as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"statusCode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ENTITLEMENT_LIMIT_EXCEEDED"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User xyz has reached the limit for entitlement: TOKENS"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your frontend can match on &lt;code&gt;error&lt;/code&gt; without parsing the human-readable message. &lt;br&gt;
Second, when you search the codebase for &lt;code&gt;EntitlementLimitExceededException&lt;/code&gt;, you find every place that throws it. With &lt;code&gt;throw new HttpException('...')&lt;/code&gt; scattered around, you have nothing to grep.&lt;/p&gt;


&lt;h2&gt;
  
  
  Validate your config at startup
&lt;/h2&gt;

&lt;p&gt;One of the most underrated NestJS patterns: &lt;strong&gt;validate your environment variables&lt;/strong&gt; when the app boots, not when a request first hits the code path that needs them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app.module.ts&lt;/span&gt;
&lt;span class="nx"&gt;ConfigModule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forRoot&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;isGlobal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;validationSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Joi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;          &lt;span class="nx"&gt;Joi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;required&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;STRIPE_SECRET_KEY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="nx"&gt;Joi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;required&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;STRIPE_WEBHOOK_SECRET&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Joi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;required&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;DASH_UI_URL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;           &lt;span class="nx"&gt;Joi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;required&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;NODE_ENV&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;              &lt;span class="nx"&gt;Joi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;development&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;production&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;test&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;development&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If any required variable is missing or has the wrong type, the app throws before it binds to port 3000. You find out in the deploy step, not at 3am when the first request hits that code path.&lt;/p&gt;

&lt;p&gt;For &lt;em&gt;CORS&lt;/em&gt;, same idea: no wildcards, explicit origins from env:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enableCors&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nx"&gt;configService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;DASH_UI_URL&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nx"&gt;configService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;DASH_API_SERVICE_URL&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;methods&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PUT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;DELETE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PATCH&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reading origins from env means you don't touch code to change them between environments.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;patterns&lt;/strong&gt; above didn't come from reading documentation, they came from running these &lt;strong&gt;services under real load&lt;/strong&gt; and fixing real bugs. &lt;/p&gt;

&lt;p&gt;The Stripe webhook pattern specifically came from a Sunday afternoon of ghost events because I had the response timing wrong. &lt;/p&gt;

&lt;p&gt;The factory pattern for event handlers came from thinking early about how to add new event types without touching existing code.&lt;/p&gt;

&lt;p&gt;If there's one principle behind all of them it's this: &lt;strong&gt;each piece of code should have exactly one reason to change&lt;/strong&gt;. The controller doesn't know about Prisma. The service doesn't know about HTTP. The repository doesn't know about business rules. When something breaks (and it will), you know exactly where to look.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9t3l75i07wsep3q675oy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9t3l75i07wsep3q675oy.png" alt="NestJS framework layers" width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;NestJS gives you the tools to do this. Whether you use them is up to you.&lt;/p&gt;

&lt;p&gt;If any of these patterns saved you a Sunday afternoon, I'd love to hear about it in the comments. And if you spot something technically off, even more so.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thanks for reading!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Wences 👋&lt;/em&gt;&lt;/p&gt;

</description>
      <category>nestjs</category>
      <category>backend</category>
      <category>software</category>
      <category>programming</category>
    </item>
    <item>
      <title>Building a production-ready RAG pipeline</title>
      <dc:creator>Wences Martinez</dc:creator>
      <pubDate>Wed, 13 May 2026 08:51:51 +0000</pubDate>
      <link>https://forem.com/wenchodev/building-a-production-ready-rag-pipeline-2jk6</link>
      <guid>https://forem.com/wenchodev/building-a-production-ready-rag-pipeline-2jk6</guid>
      <description>&lt;p&gt;&lt;strong&gt;L&lt;/strong&gt;arge &lt;strong&gt;L&lt;/strong&gt;anguage &lt;strong&gt;M&lt;/strong&gt;odels (aka &lt;em&gt;LLMs&lt;/em&gt;) have a memory problem: their knowledge stops the day their training data was cut off, they don't know your codebase, they don't know last week's tickets…&lt;/p&gt;

&lt;p&gt;When they're missing context they don't say so… they guess, confidently. The polite term is &lt;em&gt;hallucination&lt;/em&gt;; the less polite one is &lt;em&gt;lying with style&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuofgydhm3z3ggylx26ms.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuofgydhm3z3ggylx26ms.gif" alt="LLMs sometimes hallucinate" width="320" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;R&lt;/strong&gt;etrieval-&lt;strong&gt;A&lt;/strong&gt;ugmented &lt;strong&gt;G&lt;/strong&gt;eneration (aka &lt;strong&gt;RAG&lt;/strong&gt;) is how you fix that without retraining anything.&lt;/p&gt;

&lt;p&gt;Think of it as turning a closed-book exam into an open-book one. The LLM is still the writer, but now it has a librarian: a system that fetches the right passages from your data and hands them over before the model puts pen to paper.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://www.keystoneapp.dev/" rel="noopener noreferrer"&gt;Keystone&lt;/a&gt; to learn this end-to-end.&lt;/p&gt;

&lt;p&gt;Keystone does two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ingest a GitHub repository's activity → every PR, commit, issue, and discussion&lt;/li&gt;
&lt;li&gt;Answer questions about &lt;strong&gt;why the codebase looks the way it does&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first prototype didn't have a retrieval system, it had a giant string.&lt;/p&gt;

&lt;p&gt;That worked for tiny repos. On a real one (&lt;em&gt;+1000 commits&lt;/em&gt;, &lt;em&gt;+500 merged PRs&lt;/em&gt;, tons of issues, plus a tree of &lt;em&gt;~1,200 files&lt;/em&gt;) it broke in four ways at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The prompt blew past the &lt;strong&gt;context window&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The model &lt;strong&gt;lost the thread&lt;/strong&gt; halfway through.&lt;/li&gt;
&lt;li&gt;The latency hit double digits of seconds, and &lt;strong&gt;every answer cost ~$0.15 in tokens&lt;/strong&gt; for a query that should cost a fraction of a cent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the moment &lt;strong&gt;RAG stops being optional but required&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Below is what I actually built, lifted from the codebase running today 👇🏻&lt;/p&gt;




&lt;h2&gt;
  
  
  What RAG actually is (and what it isn't)
&lt;/h2&gt;

&lt;p&gt;The clean mental model: RAG is just "&lt;em&gt;search, then prompt.&lt;/em&gt;"&lt;/p&gt;

&lt;p&gt;You convert your data into a &lt;strong&gt;search index ahead of time&lt;/strong&gt;. At query time, you look up the most relevant pieces and paste only those into the prompt. That's it!&lt;/p&gt;

&lt;p&gt;We can define two main stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt;: search your data and pull the chunks most relevant to the user's question.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generation&lt;/strong&gt;: send those chunks plus the question to a regular LLM call and let it write the answer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything else; &lt;em&gt;embeddings&lt;/em&gt;, &lt;em&gt;vector databases&lt;/em&gt;, &lt;em&gt;re-ranking&lt;/em&gt;, &lt;em&gt;hybrid search&lt;/em&gt;… it exists because &lt;strong&gt;matching meaning&lt;/strong&gt; is harder than &lt;strong&gt;matching keywords&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to use RAG
&lt;/h2&gt;

&lt;p&gt;The answer depends on facts the model doesn't have.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Private docs, your codebase, last week's tickets, anything post-cutoff or non-public.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It's also how you get grounded answers with citations, the model can point at exactly which chunk of which document it used, which is the &lt;strong&gt;difference between a tool you can ship to users and a demo you can't&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to use RAG
&lt;/h2&gt;

&lt;p&gt;You want the model to &lt;strong&gt;behave differently&lt;/strong&gt; (tone, format, reasoning style). That's a f*ine-tuning* or &lt;em&gt;prompting problem&lt;/em&gt;, not a &lt;strong&gt;retrieval&lt;/strong&gt; problem.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;RAG injects knowledge, it does not change behavior.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The second &lt;strong&gt;mistake&lt;/strong&gt; I see: people reach for RAG when the data is small enough to fit in a prompt. If you have 10,000 tokens of context, just paste it. RAG buys you scale at the cost of an extra layer that will leak relevance bugs into your product.&lt;/p&gt;




&lt;h2&gt;
  
  
  The four stages every RAG has
&lt;/h2&gt;

&lt;p&gt;Every &lt;strong&gt;RAG in production&lt;/strong&gt; has the same four stages, and each one breaks in its own special way if you do it naively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ingestion&lt;/strong&gt;: pull data from somewhere and split it into chunks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding&lt;/strong&gt;: turn each chunk into a vector so similarity becomes math.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt;: search for the chunks closest to the user's question.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthesis&lt;/strong&gt;: hand those chunks to an LLM and let it write the answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnl0x13asg4o1mlm0fal3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnl0x13asg4o1mlm0fal3.png" alt="The four stages of a solid RAG pipeline" width="800" height="342"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next sections go through each one in order, with what I do in &lt;a href="https://www.keystoneapp.dev/" rel="noopener noreferrer"&gt;Keystone&lt;/a&gt; and what I'd warn you about.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Chunking: where most RAG systems fail
&lt;/h2&gt;

&lt;p&gt;This is the section nobody wants to read because chunking sounds &lt;em&gt;boring&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;It's also the section that decides &lt;strong&gt;whether your RAG actually works&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The naive approach is "&lt;em&gt;split text every 500 tokens&lt;/em&gt;." That dies for two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A PR, for example, is not just 500 tokens of one thing. It's a &lt;strong&gt;title&lt;/strong&gt;, a &lt;strong&gt;body&lt;/strong&gt;, a &lt;strong&gt;list of files&lt;/strong&gt;, a list of &lt;strong&gt;commit messages&lt;/strong&gt;, sometimes &lt;strong&gt;comments&lt;/strong&gt;, &lt;strong&gt;discussions&lt;/strong&gt;. Embedding them as one blob averages five different topics into one vector. Retrieval returns the wrong PR because the &lt;strong&gt;vector is an average of irrelevant stuff.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not all artifacts are equal&lt;/strong&gt;. A merged PR with five reviews carries more architectural signal than a &lt;em&gt;Fix typo&lt;/em&gt; commit. Treating them with the same chunk size and same metadata throws away the &lt;strong&gt;asymmetry&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I use &lt;strong&gt;&lt;em&gt;typed chunking&lt;/em&gt;&lt;/strong&gt;; different artifact types get different &lt;em&gt;chunkers&lt;/em&gt;, different size &lt;em&gt;budgets&lt;/em&gt;, and different &lt;em&gt;metadata&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;chunkPR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;IngestPullRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;EmbeddingChunk&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filesStr&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;, &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;commitsStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;commits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; | &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;commentsStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;comments&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
    &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; | Comments: &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;comments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; | &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reviewsStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reviews&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
    &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; | Reviews: &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reviews&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; | &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`[PR #&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;] &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; | Files: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;filesStr&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; | Commits: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;commitsStr&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;commentsStr&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;reviewsStr&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;sourceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`pr:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// &amp;lt;- stable, dedupable&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="nf"&gt;truncate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;       &lt;span class="c1"&gt;// &amp;lt;- PRs get 4000 chars&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pr&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;merged_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;merged_at&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here's what a &lt;strong&gt;real chunk looks like&lt;/strong&gt; coming out of that function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sourceId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pr:42"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[PR #42] Replace REST with GraphQL for the data layer: Switched from ..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pr"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"author"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wencesms92"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"number"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"merged_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-11-14T10:22:00Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Issues get their own &lt;em&gt;chunker&lt;/em&gt; with the same shape but a smaller budget (&lt;em&gt;1500 chars&lt;/em&gt;) and different metadata. Same pattern, different parameters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three things to notice:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;strong&gt;[PR #N] prefix&lt;/strong&gt; is intentional. Embedding models are sensitive to what's at the front of the text, so putting the artifact type and number first lets the model anchor on it. When I tried without the prefix, the same PR ranked lower for queries like &lt;em&gt;"what did PR 42 change?"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Each &lt;em&gt;sourceId&lt;/em&gt; is stable and globally unique (pr:42, issue:7, readme:root, topology:tree). That &lt;strong&gt;key&lt;/strong&gt; is what makes the upsert work, and it's also what lets a webhook re-embed a single PR after a merge without rebuilding the world. Same chunker, same SQL upsert, just one row.&lt;/li&gt;
&lt;li&gt;Commits get &lt;strong&gt;aggregated&lt;/strong&gt;, not chunked individually. This is the most non-obvious decision in the whole pipeline. If you embed every commit one-by-one, you drown the index in noise. I instead deduplicate commits already present inside a PR (they're embedded with the PR) and then summarize the leftover "orphan" commits into a single chunk:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Orphans = commits not already inside any PR&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prCommitShas&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pullRequests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pr&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="nx"&gt;pr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;commits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sha&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;orphans&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;commits&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;prCommitShas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sha&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
  &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;isNoiseCommit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;orphans&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;sourceId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;commits:orphan-summary&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;truncate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s2"&gt;`[Commits] &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;orphans&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; standalone commits (not in PRs) | Authors: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;authorsStr&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; | Recent: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;recentMsgs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="mi"&gt;4000&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;commits&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;orphans&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;orphan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before the orphan rolls up, a noise filter strips anything useless:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;NOISE_MSG_PATTERNS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="sr"&gt;/^merge branch/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^merge pull request/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^wip$/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^fix typo/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sr"&gt;/^fixup!/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^squash!/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^initial commit$/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^update &lt;/span&gt;&lt;span class="se"&gt;\S&lt;/span&gt;&lt;span class="sr"&gt;+$/i&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;NOISE_AUTHOR_PATTERNS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\[&lt;/span&gt;&lt;span class="sr"&gt;bot&lt;/span&gt;&lt;span class="se"&gt;\]&lt;/span&gt;&lt;span class="sr"&gt;$/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^dependabot/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^renovate/i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/^github-actions/i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Filtering bot commits and merge noise before they hit the embedding API saves cost, keeps the index dense, and stops "what's the architecture" queries from returning seventeen dependabot bumped lodash chunks.&lt;/p&gt;

&lt;p&gt;So… &lt;strong&gt;don't embed garbage&lt;/strong&gt;!&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Embeddings: picking the right model
&lt;/h2&gt;

&lt;p&gt;The boring truth: most embedding models are &lt;strong&gt;good&lt;/strong&gt; enough. The real trade-off is &lt;em&gt;dimension count × cost × domain fit&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I went with &lt;a href="https://docs.mistral.ai/models/model-cards/codestral-embed-25-05" rel="noopener noreferrer"&gt;Mistral AI codestral-embed-2505&lt;/a&gt; (&lt;em&gt;1536&lt;/em&gt; dimensions), a code-tuned embedding model ranks in a way a general-purpose model does not.&lt;/p&gt;

&lt;p&gt;Two main reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Generous free-tier&lt;/strong&gt; → Mistral's free tier is generous enough to run real embedding workloads without hitting a paywall on day one of a side project. OpenAI's free credits evaporate the moment you embed a real dataset.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain fit&lt;/strong&gt; → My data is &lt;strong&gt;code-adjacent&lt;/strong&gt;: commit messages, file paths, PR titles.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The call itself is unremarkable, which is the point. The work happens in the &lt;strong&gt;chunking&lt;/strong&gt;, not here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// At query time, embed the user's question with the same model&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;mistral&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;textEmbeddingModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;codestral-embed-2505&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. Retrieval (that doesn't suck)
&lt;/h2&gt;

&lt;p&gt;Retrieval can be one query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vectorStr&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`[&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;]`&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;projectIdsArray&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`{&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;projectIds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;}`&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;$queryRawUnsafe&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;MatchEmbeddingRow&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;`SELECT pe.id, pe."projectId" as project_id, p.name as project_name,
          pe."sourceId" as source_id, pe.content, pe.metadata,
          (1 - (pe."embedding" &amp;lt;=&amp;gt; $1::vector(1536)))::float as similarity
   FROM "ProjectEmbedding" pe
   JOIN "Project" p ON p.id = pe."projectId"
   WHERE pe."projectId" = ANY($2::text[])
     AND 1 - (pe."embedding" &amp;lt;=&amp;gt; $1::vector(1536)) &amp;gt; 0.3
   ORDER BY pe."embedding" &amp;lt;=&amp;gt; $1::vector(1536)
   LIMIT 12`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;vectorStr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;projectIdsArray&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Five&lt;/strong&gt; things this is doing on purpose:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; is the &lt;em&gt;pgvector&lt;/em&gt; &lt;strong&gt;cosine-distance operator.&lt;/strong&gt; Combined with the &lt;strong&gt;HNSW index&lt;/strong&gt; built on &lt;strong&gt;vector_cosine_ops&lt;/strong&gt;, this query uses the index instead of a sequential scan.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-filter&lt;/strong&gt; by &lt;em&gt;projectId = ANY(...)&lt;/em&gt; before the vector search. Permissioning happens before similarity ranking, so &lt;strong&gt;you never see a chunk from a project you don't have access to&lt;/strong&gt;, and the index narrows the search space.&lt;/li&gt;
&lt;li&gt;Threshold of &lt;strong&gt;&lt;em&gt;0.3&lt;/em&gt; similarity&lt;/strong&gt;. Below that, the chunk is more noise than signal. &lt;strong&gt;Lower threshold → more recall → more garbage&lt;/strong&gt; in the prompt. Tune this on real queries, not synthetic ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top 12 results&lt;/strong&gt;. Enough that 2-3 misses still leave a &lt;strong&gt;usable signal&lt;/strong&gt;; small enough that the &lt;strong&gt;prompt stays cheap&lt;/strong&gt;. I started at 25 and it was overkill. The model latched onto the first 5 anyway and the rest were filler.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;JOIN&lt;/em&gt; the Project name in the &lt;strong&gt;SELECT&lt;/strong&gt;. When the query spans multiple repos, the &lt;strong&gt;model needs to know which repo a chunk came from&lt;/strong&gt;. The repo name shows up in the chunk payload, which is what lets the answer cite &lt;em&gt;[repo-A] vs [repo-B]&lt;/em&gt; accurately.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No &lt;em&gt;re-ranker&lt;/em&gt;. No keyword pre-filter. One stage.&lt;/p&gt;

&lt;p&gt;The chunking does enough work upfront that a &lt;strong&gt;second-stage ranker hasn't been worth its latency&lt;/strong&gt; yet, and that's a real result, not laziness.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Re-rankers&lt;/em&gt; earn their keep when your chunks are big, noisy, and undifferentiated. My chunks are &lt;strong&gt;small, typed, and prefixed&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Context assembly and the LLM call
&lt;/h2&gt;

&lt;p&gt;This is where Keystone diverges from textbook RAG.&lt;/p&gt;

&lt;p&gt;Classic RAG does this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;embed(query) → search → concat(top_k) → prompt → generate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prompt(LLM, tools={search, tree, file}) → LLM decides → up to 10 tool calls → final answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM is the &lt;strong&gt;orchestrator&lt;/strong&gt;. It sees a system prompt that explains the two data sources, vectorized memory vs. live code, and the available repos. Then it chooses which tool to call.&lt;/p&gt;

&lt;p&gt;The split looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vectorized memory holds the why&lt;/strong&gt; → PR descriptions, issue threads, commit messages, the artifacts where decisions are explained. Vectors of these stay useful even when the code drifts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live file access holds the what&lt;/strong&gt; → The current &lt;em&gt;package.json&lt;/em&gt;, the current list of plugins, the current value of a constant. Stale vectors of months-old code lie about the present, so for "what" questions I read the file fresh &lt;strong&gt;via the GitHub API&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's what the &lt;em&gt;agentic retrieval&lt;/em&gt; actually looks like in production logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;Chat Tool] searchTechnicalMemory: &lt;span class="s2"&gt;"relationship between open-webui, opencode, and openclaw"&lt;/span&gt; across 3 project&lt;span class="o"&gt;(&lt;/span&gt;s&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="o"&gt;[&lt;/span&gt;Chat Tool] Found 9 results &lt;span class="o"&gt;[&lt;/span&gt;
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'open-webui'&lt;/span&gt;, sourceId: &lt;span class="s1"&gt;'readme:root'&lt;/span&gt;,            similarity: 0.555 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'openclaw'&lt;/span&gt;,   sourceId: &lt;span class="s1"&gt;'readme:root'&lt;/span&gt;,            similarity: 0.544 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'opencode'&lt;/span&gt;,   sourceId: &lt;span class="s1"&gt;'readme:root'&lt;/span&gt;,            similarity: 0.503 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'openclaw'&lt;/span&gt;,   sourceId: &lt;span class="s1"&gt;'topology:tree'&lt;/span&gt;,          similarity: 0.465 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'open-webui'&lt;/span&gt;, sourceId: &lt;span class="s1"&gt;'topology:tree'&lt;/span&gt;,          similarity: 0.463 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'opencode'&lt;/span&gt;,   sourceId: &lt;span class="s1"&gt;'topology:tree'&lt;/span&gt;,          similarity: 0.463 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'opencode'&lt;/span&gt;,   sourceId: &lt;span class="s1"&gt;'commits:orphan-summary'&lt;/span&gt;, similarity: 0.439 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'openclaw'&lt;/span&gt;,   sourceId: &lt;span class="s1"&gt;'commits:orphan-summary'&lt;/span&gt;, similarity: 0.431 &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="o"&gt;{&lt;/span&gt; repo: &lt;span class="s1"&gt;'open-webui'&lt;/span&gt;, sourceId: &lt;span class="s1"&gt;'commits:orphan-summary'&lt;/span&gt;, similarity: 0.420 &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model chose to &lt;strong&gt;search across all three repos in a single call&lt;/strong&gt;, it understood the query was cross-project without being told.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;code&gt;readme:root&lt;/code&gt; chunks rank highest (&lt;em&gt;0.55&lt;/em&gt;, &lt;em&gt;0.54&lt;/em&gt;, &lt;em&gt;0.50&lt;/em&gt;) because &lt;em&gt;READMEs&lt;/em&gt; describe what a project &lt;em&gt;is&lt;/em&gt;, and the query asks exactly that.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;topology:tree&lt;/code&gt; chunks rank next: file structure is the second most useful signal for understanding how three repos relate.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;commits:orphan-summary&lt;/code&gt; chunks come in last but still above the &lt;em&gt;0.3&lt;/em&gt; floor, adding commit-level context without the noise of individual commits.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two practical effects:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The model can iterate&lt;/strong&gt; → It might search memory, realize the answer needs a file, fetch the file, then answer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The prompt stays small&lt;/strong&gt; → Only the chunks the model actually requested make it into the conversation. No &lt;em&gt;"stuff top-25 into system prompt"&lt;/em&gt; bloat.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The synthesis model itself is &lt;a href="https://docs.mistral.ai/models/model-cards/devstral-small-2-25-12" rel="noopener noreferrer"&gt;Mistral AI devstral-small-latest&lt;/a&gt;: &lt;strong&gt;small&lt;/strong&gt;, &lt;strong&gt;cheap&lt;/strong&gt;, &lt;strong&gt;fast&lt;/strong&gt;. With good retrieval you don't need a frontier model for the writing step. The expensive part of "intelligence" is finding the right context. Writing a coherent paragraph from good context is the easy part.&lt;/p&gt;

&lt;p&gt;Every call gets logged with input/output tokens, step count, and finish reason, both to a usage table and to PostHog. That's the observability layer that lets me actually answer "is retrieval getting better or worse this week?" with a graph instead of a vibe.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1yqp3uunlpzd6l37xkdu.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1yqp3uunlpzd6l37xkdu.gif" alt="Keystone input &amp;amp; output tokens usage" width="200" height="147"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;The pipeline above (&lt;em&gt;typed chunking&lt;/em&gt;, &lt;strong&gt;code-tuned embeddings&lt;/strong&gt;, &lt;strong&gt;HNSW + pgvector&lt;/strong&gt;, and an &lt;strong&gt;LLM that knows when to search&lt;/strong&gt;) is what's running inside &lt;a href="https://www.keystoneapp.dev/" rel="noopener noreferrer"&gt;Keystone&lt;/a&gt; today.&lt;/p&gt;

&lt;p&gt;It's small, opinionated, and it works because every stage has one job and respects the constraints of the next.&lt;/p&gt;

&lt;p&gt;If there's one thing to take away: &lt;strong&gt;ignore the model leaderboards for a week and go obsess over your chunking&lt;/strong&gt;. That's where the wins are.&lt;/p&gt;

&lt;p&gt;The fanciest embedding model in the world &lt;strong&gt;can't rescue data that's been concatenated into mush&lt;/strong&gt;, and the cheapest model is plenty good when the chunks coming in are sharp, typed, and free of noise.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;RAG isn't a magic upgrade for LLMs. It's a librarian, and a librarian is only as good as the way you organized the shelves.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://www.keystoneapp.dev/" rel="noopener noreferrer"&gt;Keystone&lt;/a&gt; is the project I'm building to give software teams a living memory of their codebase: every PR, commit, issue, and decision, queryable in natural language.&lt;/p&gt;

&lt;p&gt;If you have any suggestions I'd love to hear them on the comments section!&lt;/p&gt;




&lt;p&gt;Thanks for reading! 👋&lt;/p&gt;

&lt;p&gt;Wences.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>softwareengineering</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Optimizing Nuxt + Prisma in Docker: How we cut our image size by 84%</title>
      <dc:creator>Wences Martinez</dc:creator>
      <pubDate>Thu, 05 Mar 2026 08:29:59 +0000</pubDate>
      <link>https://forem.com/wenchodev/optimizing-nuxt-prisma-in-docker-how-we-cut-our-image-size-by-84-15lb</link>
      <guid>https://forem.com/wenchodev/optimizing-nuxt-prisma-in-docker-how-we-cut-our-image-size-by-84-15lb</guid>
      <description>&lt;p&gt;By now, &lt;a href="//nuxt.com"&gt;Nuxt&lt;/a&gt; hardly needs an introduction as one of the &lt;strong&gt;most used full-stack frameworks&lt;/strong&gt;. For those new to the Vue ecosystem, it is essentially the counterpart to Next.js in the React world.&lt;/p&gt;

&lt;p&gt;So, Nuxt is not just a frontend framework. With Nitro as the server engine, you get SSR, API routes, server middleware… it’s a &lt;strong&gt;real full stack framework&lt;/strong&gt;. We’ve been using it at Resizes for a long time to build complex applications, and it works really well!&lt;/p&gt;

&lt;p&gt;That sounds really cool, right? But what is the most &lt;strong&gt;efficient&lt;/strong&gt;, &lt;strong&gt;fastest&lt;/strong&gt;, and most &lt;strong&gt;lightweight&lt;/strong&gt; way to build a Docker image for a Nuxt app?&lt;/p&gt;

&lt;p&gt;Let’s talk about it!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5x0tiwxw1w61cgup5qd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5x0tiwxw1w61cgup5qd.png" alt=" " width="800" height="422"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Our Use Case
&lt;/h3&gt;

&lt;p&gt;At &lt;a href="//resiz.es"&gt;Resizes&lt;/a&gt; we have developed several applications with Nuxt4, and one of them has several modules and dependencies; an &lt;strong&gt;auth&lt;/strong&gt; library, an &lt;strong&gt;ORM&lt;/strong&gt; to handle database migrations and even an &lt;strong&gt;embedded database&lt;/strong&gt;, so dockerizing it efficiently was not as straightforward as it seems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihlxxgx3mmylgtair8sj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihlxxgx3mmylgtair8sj.png" alt="Our Nuxt stack" width="800" height="215"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this post I’ll share how we do it, with &lt;strong&gt;real numbers&lt;/strong&gt; from our GitHub Actions pipeline and the hidden traps we found along the way.&lt;/p&gt;

&lt;h3&gt;
  
  
  The stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Nuxt 4 + &lt;a href="https://nitro.build/" rel="noopener noreferrer"&gt;Nitro&lt;/a&gt; (server engine)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.prisma.io/" rel="noopener noreferrer"&gt;Prisma ORM&lt;/a&gt; + SQLite via &lt;a href="https://www.npmjs.com/package/better-sqlite3" rel="noopener noreferrer"&gt;&lt;em&gt;better-sqlite3&lt;/em&gt;&lt;/a&gt; (native C++ module)&lt;/li&gt;
&lt;li&gt;Docker with multi-stage build&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following &lt;em&gt;package.json&lt;/em&gt; reveals some of our most used dependencies regarding a Nuxt 4 application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;package.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"scripts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dev"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nuxt dev"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"build"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nuxt build"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"generate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nuxt generate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"preview"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nuxt preview"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"postinstall"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"nuxt prepare &amp;amp;&amp;amp; prisma generate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"lint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eslint ."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"lint:fix"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eslint . --fix"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"dependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"better-sqlite3"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~12.0.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@prisma/client"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~7.2.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@prisma/adapter-better-sqlite3"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~7.3.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@anthropic-ai/sdk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~0.72.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"better-auth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~1.4.17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"zod"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~4.3.5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prisma"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~7.2.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dotenv"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~17.2.3"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"devDependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"nuxt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~4.2.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@nuxt/ui"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~4.3.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@nuxt/eslint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~1.12.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~3.5.27"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@vueuse/nuxt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~14.1.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"typescript"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~5.9.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"eslint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~9.39.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"eslint-config-prettier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~10.1.8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prettier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~3.8.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tsx"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~4.21.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  The Dockerfile: single vs multi stage
&lt;/h3&gt;

&lt;p&gt;Can you deploy a Nuxt application with a basic Dockerfile? Of course you can! But that &lt;strong&gt;doesn’t mean you should&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;While a single-stage Dockerfile will technically build your app, it drags a &lt;strong&gt;huge amount of baggage&lt;/strong&gt; into production. We’re talking about build tools, dev dependencies, source code, and artifacts.&lt;/p&gt;

&lt;p&gt;You end up shipping the entire kitchen sink instead of just the lean, compiled application.&lt;/p&gt;

&lt;p&gt;An example of a basic single-stage Dockerfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Single-stage Dockerfile (for comparison only — DO NOT use in production)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; node:24-slim&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; python3 make g++ &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; DATABASE_URL="file:./dummy.db"&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci
&lt;span class="k"&gt;RUN &lt;/span&gt;npx prisma generate &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npx nuxt build

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_ENV=production&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NITRO_HOST=0.0.0.0&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NITRO_PORT=3000&lt;/span&gt;

&lt;span class="c"&gt;# Still need to fix Nitro stubs even in single-stage&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; .output/server/node_modules
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; node_modules .output/server/node_modules

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; docker-entrypoint.sh ./docker-entrypoint.sh&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x ./docker-entrypoint.sh

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 3000&lt;/span&gt;

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["./docker-entrypoint.sh"]&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", ".output/server/index.mjs"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’ve done the test and we’ve built our app with the above Dockerfile vs a multi-stage Dockerfile to compare the size of the final image between them.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The results are crazy: &lt;strong&gt;4.05 GB vs 637MB&lt;/strong&gt;. The multi-stage Dockerfile is almost &lt;strong&gt;x7 times smaller&lt;/strong&gt; than the single-stage Dockerfile 🫢&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fud6alp9mz18tsi0bbw40.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fud6alp9mz18tsi0bbw40.png" alt="Single-stage vs multi-stage image" width="800" height="94"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F075gv2p9de0z1m6oaqae.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F075gv2p9de0z1m6oaqae.png" alt="Docker Layer Anatomy: Where Does All That Space Go?" width="800" height="209"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  But… how we did it?
&lt;/h2&gt;

&lt;p&gt;Multi-stage build separates the process into phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build&lt;/strong&gt; stages: have everything needed to compile. They’re temporary images that get &lt;strong&gt;discarded from the final image&lt;/strong&gt;!&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runner&lt;/strong&gt; stage: contains only the minimum code to run the app. It’s the only image pushed to the registry!
The concrete benefits:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Smaller final image.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduced &lt;strong&gt;attack surface&lt;/strong&gt; — no compilers in production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster&lt;/strong&gt; deploys — less pull time on each Kubernetes node.&lt;/li&gt;
&lt;li&gt;Automatic &lt;strong&gt;parallelism&lt;/strong&gt; — Docker runs independent stages in parallel (if required).&lt;/li&gt;
&lt;li&gt;Granular &lt;strong&gt;caching&lt;/strong&gt; — if you only change code, the dependencies stage is fully cached.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our Dockerfile has 4 stages. Let’s go through each one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5y3m7sexqhd9du98seti.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5y3m7sexqhd9du98seti.png" alt="Dockerfile multistage steps" width="319" height="858"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 1: Base tools
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# -------- Base --------&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:24-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;base&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; python3 make g++ openssl &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’re probably wondering why the heck do we need &lt;em&gt;python3&lt;/em&gt;, &lt;em&gt;make&lt;/em&gt; &amp;amp; &lt;em&gt;g++&lt;/em&gt; dependencies if this is a node environment application.&lt;/p&gt;

&lt;p&gt;Easy, we’re using &lt;em&gt;SQLite&lt;/em&gt; as a embedded database through the super fast &lt;a href="https://www.npmjs.com/package/better-sqlite3" rel="noopener noreferrer"&gt;better-sqlite3&lt;/a&gt; library, and this library needs to &lt;strong&gt;compile native C++ bindings&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;SQLite is a library written in C, and to use it from Node.js, better-sqlite3 includes a “binding” (a bridge between JavaScript and compiled C code).&lt;/p&gt;

&lt;p&gt;This binding is compiled during npm install using node-gyp, which needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;g++&lt;/em&gt; — C++ compiler&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;make&lt;/em&gt; — orchestrates the compilation&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;python3&lt;/em&gt; — because node-gyp is written in Python&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a &lt;em&gt;.node&lt;/em&gt; file (binary addon) specific to your architecture and Node version.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;These tools take up ~293MB, but they won’t be in the final image 💪🏼&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Stage 2: Full dependencies
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;
&lt;span class="c"&gt;# -------- Dependencies --------&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;base&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;deps&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package.json package-lock.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm ci &lt;span class="nt"&gt;--ignore-scripts&lt;/span&gt;
In the second stage we’re only copying the package.json and package-lock.json and run npm ci with the ignore-scripts flag.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the second stage we’re only copying the &lt;em&gt;package.json&lt;/em&gt; and &lt;em&gt;package-lock.json&lt;/em&gt; and run &lt;em&gt;npm ci&lt;/em&gt; with the &lt;em&gt;ignore-scripts&lt;/em&gt; flag.&lt;/p&gt;

&lt;p&gt;This flag is really important if you want to &lt;strong&gt;avoid any unnecessary drama&lt;/strong&gt;. Tools like Prisma or Nuxt love to run ‘postinstall’ scripts the second they’re downloaded, but since we haven’t copied the full source code yet, those scripts would just crash.&lt;/p&gt;

&lt;p&gt;By skipping the installation with this flag, we keep it lightning-fast, let Docker cache the layer perfectly, and save the actual heavy lifting for the Build stage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 3: Building the application
&lt;/h2&gt;

&lt;p&gt;This is where the heavy lifting happens!&lt;/p&gt;

&lt;p&gt;Now that we have our dependencies ready, we perform a &lt;strong&gt;triple threat of critical tasks&lt;/strong&gt;: generating the &lt;strong&gt;Prisma client&lt;/strong&gt;, compiling the &lt;strong&gt;Nuxt bundle&lt;/strong&gt;, and &lt;strong&gt;rebuilding &lt;em&gt;better-sqlite3&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This last step is vital — it ensures our SQLite driver is natively compiled for the Linux environment, avoiding those dreaded “mismatched binary” errors in production.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# -------- Build --------&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;base&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_OPTIONS="--max-old-space-size=4096"&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=deps /app/node_modules ./node_modules&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;

&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; DATABASE_URL="file:./dummy.db"&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npx prisma generate &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npx nuxt build &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm rebuild better-sqlite3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;npx prisma generate&lt;/em&gt;&lt;/strong&gt;: Generates the Prisma Client based on your schema.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;npx nuxt build&lt;/em&gt;&lt;/strong&gt;: This command triggers the full production build, bundling the client and server-side code through the Nitro engine into a standalone directory (.output folder)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;em&gt;npx rebuild better-sqlite3&lt;/em&gt;&lt;/strong&gt;: The “secret sauce” for SQLite. This recompiles the native C++ bindings for our database driver directly inside the Linux container. It ensures the binary perfectly matches our production environment, preventing the dreaded “mismatched architecture” errors.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Stage 5: The production image
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# -------- Runtime --------&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:24-slim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;runner&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; openssl &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NODE_ENV=production&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NITRO_HOST=0.0.0.0&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; NITRO_PORT=3000&lt;/span&gt;

&lt;span class="c"&gt;# Minimal Prisma CLI for migrations - must run BEFORE package.json is copied,&lt;/span&gt;
&lt;span class="c"&gt;# otherwise npm sees the app's package.json and installs ALL dependencies.&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package-lock.json ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--ignore-scripts&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="s2"&gt;"prisma@&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;node &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"require('./package-lock.json').packages['node_modules/prisma'].version"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="s2"&gt;"dotenv@&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;node &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"require('./package-lock.json').packages['node_modules/dotenv'].version"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; npm cache clean &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm &lt;/span&gt;package-lock.json

&lt;span class="c"&gt;# Copy compiled app (keeps Nitro's runtime packages in .output/server/node_modules/)&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/.output ./.output&lt;/span&gt;

&lt;span class="c"&gt;# Replace better-sqlite3 with the Linux binary compiled in the build stage.&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; .output/server/node_modules/better-sqlite3
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/node_modules/better-sqlite3 ./.output/server/node_modules/better-sqlite3&lt;/span&gt;

&lt;span class="c"&gt;# Copy Prisma config + migrations folder for migrate deploy&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/prisma ./prisma&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /app/prisma.config.ts ./prisma.config.ts&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; docker-entrypoint.sh ./docker-entrypoint.sh&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x ./docker-entrypoint.sh

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 3000&lt;/span&gt;

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["./docker-entrypoint.sh"]&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["node", ".output/server/index.mjs"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;The final image uses &lt;em&gt;node:24-slim&lt;/em&gt;; no compilers, no Python, no dev tools. Just Node.js and your app.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Three important details here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We leave .output/server/node_modules intact — except for better-sqlite3, which we delete and replace with a freshly compiled binary (in build stage).&lt;/li&gt;
&lt;li&gt;We install only &lt;strong&gt;prisma&lt;/strong&gt; and &lt;strong&gt;dotenv&lt;/strong&gt; into /app/node_modules, the bare minimum needed to run prisma migrate deploy in the entrypoint script. The rest of the app’s dependencies are already bundled inside .output/server/node_modules by Nitro.&lt;/li&gt;
&lt;li&gt;Automatic Prisma migrations with &lt;em&gt;entrypoint.sh&lt;/em&gt;, this script runs &lt;strong&gt;&lt;em&gt;prisma migrate deploy&lt;/em&gt;&lt;/strong&gt; every time the container starts, ensuring your database always has the latest schema:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/sh&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Applying database migrations..."&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;NODE_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/app/node_modules"&lt;/span&gt;
/app/node_modules/.bin/prisma migrate deploy

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Starting application..."&lt;/span&gt;
&lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$@&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  GitHub Actions pipeline results
&lt;/h2&gt;

&lt;p&gt;These are real numbers from our GitHub Actions pipeline for AMD64 architecture when building the app’s image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5o8lwtk9i34zje27ih2n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5o8lwtk9i34zje27ih2n.png" alt="GitHub Actions Runner results" width="800" height="229"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Multi-arch tip: If you build ARM64 images on AMD64 runners via QEMU, expect 4–7x slower builds. Use native ARM runners if speed matters!&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Hidden Gotchas
&lt;/h2&gt;

&lt;p&gt;Nitro can’t bundle everything… some packages must stay in node_modules!&lt;/p&gt;

&lt;p&gt;Nitro uses Rollup/esbuild to bundle your server code into optimized &lt;em&gt;.mjs&lt;/em&gt; files — but not everything can be bundled.&lt;/p&gt;

&lt;p&gt;Modules like &lt;em&gt;better-sqlite3&lt;/em&gt; contain native C++ bindings (.node files), which are platform-specific binaries that simply &lt;strong&gt;cannot be converted into a .mjs file&lt;/strong&gt;. For those, Nitro falls back to copying them — along with their full dependency tree — into &lt;em&gt;.output/server/node_modules/&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;So, we have to leave &lt;em&gt;.output/server/node_modules&lt;/em&gt; intact, except for &lt;em&gt;better-sqlite3&lt;/em&gt;, which we delete and replace with a freshly compiled binary (previously generated with &lt;em&gt;npm rebuild better-sqlite3&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;We install only &lt;em&gt;prisma&lt;/em&gt; and &lt;em&gt;dotenv&lt;/em&gt; into the root &lt;em&gt;/app/node_modules&lt;/em&gt;; the bare minimum needed to run &lt;em&gt;prisma migrate deploy&lt;/em&gt; in the entrypoint. The rest of the app’s dependencies are already bundled inside &lt;em&gt;.output/server&lt;/em&gt; folder by Nitro.&lt;/p&gt;

&lt;p&gt;💡 We explicitly mark &lt;em&gt;better-sqlite3&lt;/em&gt;, among other dependencies, as external in the &lt;em&gt;nuxt.config.ts&lt;/em&gt; file, which tells Nitro to skip bundling them entirely and copy them as-is into &lt;em&gt;.output/server/node_modules&lt;/em&gt; directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// nuxt.config.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;defineNuxtConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;nitro&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;esbuild&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;es2020&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;externals&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;external&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;better-sqlite3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Dockerizing a full-stack Nuxt application isn’t trivial when you have native modules or you want to reduce significantly the final image size, but with a solid multi-stage build it’s completely viable.&lt;/p&gt;

&lt;p&gt;Key takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-stage build&lt;/strong&gt; is not optional — it’s the difference between &lt;strong&gt;637MB and 4.05 GB&lt;/strong&gt;. A &lt;strong&gt;84% image size reduction&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The numbers speak: &lt;strong&gt;~4 minutes&lt;/strong&gt; for a complete AMD64 build, lightweight final image ready for production.&lt;/li&gt;
&lt;li&gt;Don’t delete &lt;em&gt;.output/server/node_modules&lt;/em&gt; — Nitro copies the externalized packages there and needs them at runtime. Only &lt;strong&gt;replace the ones with native binaries&lt;/strong&gt; (like better-sqlite3) that need to be recompiled for Linux.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re evaluating Nuxt for your next full-stack project, give it a shot. The Vue ecosystem has a first-class full stack framework, and with Docker, deploying it is just matter of minutes!&lt;/p&gt;




&lt;p&gt;This post is based on our real experience deploying Nuxt applications in production at &lt;a href="//resiz.es"&gt;Resizes&lt;/a&gt;. All timings and data are from our GitHub Actions pipeline.&lt;/p&gt;

&lt;p&gt;Feel free to share this post among your community!&lt;/p&gt;

&lt;p&gt;See you! 👋🏻&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>nuxt</category>
      <category>docker</category>
      <category>prisma</category>
    </item>
  </channel>
</rss>
