<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Steven Leggett</title>
    <description>The latest articles on Forem by Steven Leggett (@cdnsteve).</description>
    <link>https://forem.com/cdnsteve</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2175%2Fdea686e0-019d-4600-99b4-cd102c412950.jpg</url>
      <title>Forem: Steven Leggett</title>
      <link>https://forem.com/cdnsteve</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/cdnsteve"/>
    <language>en</language>
    <item>
      <title>Debug Real Production Incidents in Your Browser Before They Happen to You</title>
      <dc:creator>Steven Leggett</dc:creator>
      <pubDate>Tue, 24 Mar 2026 20:12:26 +0000</pubDate>
      <link>https://forem.com/cdnsteve/how-i-turned-my-worst-on-call-nightmare-into-a-browser-game-36fa</link>
      <guid>https://forem.com/cdnsteve/how-i-turned-my-worst-on-call-nightmare-into-a-browser-game-36fa</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4z7rvatwno2dq93tw4zz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4z7rvatwno2dq93tw4zz.png" alt="The Situation Room - 3D war room with live architecture map, alerts, and terminal" width="800" height="552"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The Situation Room - a 3D war room with live architecture, cascading alerts, and a ticking clock&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Idea: A Flight Simulator for On-Call Engineers
&lt;/h2&gt;

&lt;p&gt;Here's the thing about learning incident response: you learn it by breaking things in production, on a clock, while people are angry. There is no training simulator. You get one real scenario at a time, with real consequences, and you either figure it out or you don't.&lt;/p&gt;

&lt;p&gt;Flight simulators exist so pilots can experience a landing gear failure, a stall at 30,000 feet, a hydraulics problem - without dying. Musicians practice scales before performing. Athletes train before competing. Surgeons use simulation before cutting into a real person.&lt;/p&gt;

&lt;p&gt;But SREs? You get dropped into a live incident and told to figure it out. Maybe your company has a "Wheel of Misfortune" session occasionally, but realistically you just wait for the pager to go off and hope the scenario isn't too bad.&lt;/p&gt;

&lt;p&gt;I wanted to build something different. A place where you could practice the muscle memory of incident response without the 3 AM wake-up call. Where you could learn what a connection pool exhaustion actually looks like in the logs before you encounter it at 2 AM with a VP watching over your shoulder.&lt;/p&gt;

&lt;p&gt;The tagline came quickly: &lt;strong&gt;Wordle meets Hack The Box for DevOps.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wordle because it needed to be quick, daily, shareable, and satisfying&lt;/li&gt;
&lt;li&gt;Hack The Box because it needed to be genuinely technical and respect your intelligence&lt;/li&gt;
&lt;li&gt;For DevOps because SREs, platform engineers, and backend developers are the audience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted the game to feel like you were actually sitting at a terminal in a crisis. Not a quiz. Not a multiple choice test. A real investigation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building the Thing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Picking the Stack
&lt;/h3&gt;

&lt;p&gt;I wanted to go from zero to playable as fast as possible, and I wanted the infrastructure costs to scale from "$0 at launch" to "reasonable at 10,000 users." That meant serverless-first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next.js 16&lt;/strong&gt; was a given. App Router, React Server Components, edge runtime, first-class Vercel support. I've used it for years and I can move fast in it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Turso&lt;/strong&gt; for the database. This was the interesting choice. Turso is SQLite at the edge - it gives you sub-10ms reads globally, scales to zero when nobody is playing, and the free tier covers a lot of users. I was already using Drizzle ORM and had a schema ready. The alternative was Supabase's Postgres, but for a read-heavy app (leaderboards, profile lookups) Turso made more sense and the economics were better at launch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supabase Auth&lt;/strong&gt; for authentication. Free up to 50,000 monthly active users. GitHub OAuth and Google OAuth both built in. It handles email verification, password resets, and OAuth flows out of the box. When you're building solo, you don't want to think about auth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tailwind v4&lt;/strong&gt; for styling. The v4 release with its CSS-first configuration cleaned up a lot of the config overhead I used to fight with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zustand&lt;/strong&gt; for client state. The game state during an active incident - your command history, your elapsed time, your discovered clues, your score - lives in a Zustand store. Simple, no boilerplate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anime.js v4&lt;/strong&gt; for animations. This was the fun one.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Game Engine
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5l32hsjauj3s1d00drl4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5l32hsjauj3s1d00drl4.png" alt="Clicking a service node reveals alerts and investigation options" width="800" height="462"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Clicking a service node in the architecture map reveals alerts and investigation options&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The core of the whole thing is &lt;code&gt;IncidentEngine.ts&lt;/code&gt; - a TypeScript class that runs each scenario.&lt;/p&gt;

&lt;p&gt;Every incident is defined as a TypeScript object: it has metadata (title, difficulty, time limit), a scenario description, an environment with simulated logs and metrics, a list of commands the player can run, diagnosis options, and a solution with hints.&lt;/p&gt;

&lt;p&gt;The engine does a few things:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3i4chzq2xuyg8vuw856j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3i4chzq2xuyg8vuw856j.png" alt="YouBrokeProd situtation room 3d experience for SREs and devops" width="800" height="824"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Manages the game timer and emits events at warning thresholds (30% time remaining, 10% time remaining, time expired)&lt;/li&gt;
&lt;li&gt;Processes commands by pattern-matching against the incident's defined commands, then checking shared easter egg commands&lt;/li&gt;
&lt;li&gt;Tracks which clues have been discovered as the player runs investigative commands&lt;/li&gt;
&lt;li&gt;Handles diagnosis submission - correct gets you to the "fixing" phase, incorrect costs you points&lt;/li&gt;
&lt;li&gt;Validates fix commands with flexible matching so you don't have to type a 200-character shell command perfectly&lt;/li&gt;
&lt;li&gt;Calculates a final score based on time taken vs par time, whether you diagnosed correctly on the first try, and how many commands you used vs the optimal path&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The command system was one of the more interesting design challenges. Real terminal commands have arguments, flags, and variations. I couldn't enumerate every possible way someone might run &lt;code&gt;df -h&lt;/code&gt; vs &lt;code&gt;df -H&lt;/code&gt; vs &lt;code&gt;df --human-readable&lt;/code&gt;. The solution was pattern matching with some tolerance - commands match if the input starts with the pattern, or if it matches a regex, or if it contains enough of the key terms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxakzl2h7ivpokie2hcbz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxakzl2h7ivpokie2hcbz.png" alt="YouBrokeProd 3d simulation for SRE Payment orchestrator with detailed service logs" width="800" height="789"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The clue discovery system is what makes the game feel like an investigation rather than a quiz. Running &lt;code&gt;df -h&lt;/code&gt; reveals the clue "var-full." Running &lt;code&gt;du -sh /var/log/*&lt;/code&gt; reveals "analytics-logs." Each clue unlocked drives you deeper into the diagnosis. The engine emits a &lt;code&gt;clue_discovered&lt;/code&gt; event that triggers a satisfying UI animation.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned Building It
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The balance between educational and fun is harder than it sounds.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every scenario has to be realistic enough that a working engineer recognizes the pattern from their own experience. But it also has to be solvable in a few minutes by someone who hasn't been doing this for ten years. Beginner scenarios need to feel approachable. Advanced ones need to feel like a genuine challenge.&lt;/p&gt;

&lt;p&gt;I spent more time on scenario design than on any other part of the project. The disk-full scenario alone went through several iterations before it felt right. The key insight was: start with real postmortems. The disk-full debug-logging story is a thing that actually happens. The DB connection pool exhaustion that has a leak in the error handling path - real. The K8s crash loop from a process trying to bind to port 80 as a non-root user - real. The DNS failure that's half because of propagation and half because of TTL misconfiguration - real.&lt;/p&gt;

&lt;p&gt;Scenarios based on real patterns teach real skills. Made-up scenarios feel like trivia.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scoped animations in React are worth the setup cost.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Anime.js v4's scoped animation system was unfamiliar at first. The v3 API was simpler. But the v4 approach where you attach animations to a DOM ref and call &lt;code&gt;scope.revert()&lt;/code&gt; on cleanup eliminated an entire class of bugs I was hitting - animations targeting elements that no longer existed in the DOM, intervals that kept firing after a game ended. The &lt;code&gt;useAnime&lt;/code&gt; hook is 20 lines of code and everything downstream just works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TypeScript for game data is underrated.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Defining incidents as typed TypeScript objects gives you autocomplete on every field, type errors if you forget a required property, and no JSON parsing issues. The alternative would be a database of scenario definitions or a YAML/JSON format. Those have their place, but for this project where the scenario logic is complex and needs to be right, having the TypeScript compiler check your work is worth it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write the easter eggs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I almost cut them because they felt like scope creep. I'm glad I didn't. The game without easter eggs is a quiz. With them, it's something you want to share. The &lt;code&gt;chatgpt&lt;/code&gt; easter egg - where ChatGPT confidently gives you unhelpful advice and then asks if maybe you're using MongoDB - has gotten more reactions in playtesting than any feature I actually planned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQLite at the edge is genuinely good now.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Turso was a gamble as the database choice. SQLite has a reputation for being a toy. But Turso's implementation is production-ready. The Drizzle integration is clean. The free tier is generous. For a read-heavy app with global users, edge replication means reads that actually are fast regardless of where you are. I haven't had a single database-related issue since launch.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Situation Room
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzqwalzd1cje5l1ooguo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmzqwalzd1cje5l1ooguo.png" alt="Full war room with node selected, live metrics, and terminal" width="800" height="500"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The full war room mid-incident - architecture map, live metrics, alert feed, and terminal all running simultaneously&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The Situation Room - a 3D interactive war room. Live architecture visualization with traffic flows and service dependencies rendered in real time. Cascading alerts. Streaming logs. A countdown clock as revenue drops by the second. Countdown music that kicks in at 45 seconds and genuinely stresses me out even though I wrote the thing.&lt;/p&gt;

&lt;p&gt;The first Situation Room scenario is a Cyber Monday DNS failure. Your payment provider becomes unreachable at peak traffic. You have the full service map in front of you - every node clickable, every service inspectable. Four difficulty modes from guided (5 services, extra hints) to Hardcore with permadeath and a 30-service enterprise architecture full of red herrings.&lt;/p&gt;

&lt;p&gt;Nobody has earned the "Nerves of Steel" badge yet. That's the one for beating the Situation Room on Hardcore.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;10+ scenarios are live at &lt;a href="https://youbrokeprod.com" rel="noopener noreferrer"&gt;youbrokeprod.com&lt;/a&gt;, free to play, no credit card required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Beginner:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Disk full from debug logs left on in production (the 3 AM story above)&lt;/li&gt;
&lt;li&gt;Expired SSL certificate (everyone's had this one)&lt;/li&gt;
&lt;li&gt;DB connection pool exhaustion from a connection leak in the error path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Intermediate:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;K8s CrashLoopBackOff from a port binding permission error&lt;/li&gt;
&lt;li&gt;Mobile API breaking change that someone forgot to version&lt;/li&gt;
&lt;li&gt;DNS failure in the Situation Room - the new 3D war room mode&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Advanced:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memory leak that only reproduces under production traffic patterns&lt;/li&gt;
&lt;li&gt;Redis thundering herd when all connections in the pool expire simultaneously&lt;/li&gt;
&lt;li&gt;Lambda cold start cascade turning a routine deploy into a 10-minute outage&lt;/li&gt;
&lt;li&gt;The terraform destroy scenario based on the real DataTalksClub incident (685K+ views, 9% win rate)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each scenario has a scoring system (time, accuracy, efficiency), progressive hints, easter eggs, and a result card you can share. The Situation Room adds live architecture visualization, post-game postmortems with real-world incidents (Dyn 2016, Fastly 2021, Cloudflare 2020), and difficulty modes that scale the architecture complexity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8b2cu9fe99uviznmv3zg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8b2cu9fe99uviznmv3zg.png" alt="SRE post mortems" width="800" height="761"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The game is free. Pro unlocks harder difficulty modes and deep-dive postmortems with actual monitoring configs and prevention playbooks you can take back to work. For now I just want engineers to play it, get better at incident response, and maybe think about setting up disk space monitoring before they need it at 3 AM.&lt;/p&gt;




&lt;p&gt;If you've ever sat at a terminal at 2 AM and typed "why is everything broken" into a command prompt as if the computer would answer you directly - this game is for you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fre7prsct8jd5d0gnyr74.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fre7prsct8jd5d0gnyr74.png" alt="The intro animation with traffic flowing through the architecture" width="800" height="525"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Traffic flowing through the architecture before the incident hits&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Play it at &lt;strong&gt;&lt;a href="https://youbrokeprod.com" rel="noopener noreferrer"&gt;youbrokeprod.com&lt;/a&gt;&lt;/strong&gt;. Try the Situation Room if you want the full 3D war room experience, or start with the disk-full scenario if you want to ease in. Either way, the countdown is ticking.&lt;/p&gt;




</description>
      <category>devops</category>
      <category>nextjs</category>
      <category>gamedev</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I built a training simulator for the dev skills nobody teaches</title>
      <dc:creator>Steven Leggett</dc:creator>
      <pubDate>Tue, 17 Mar 2026 18:21:00 +0000</pubDate>
      <link>https://forem.com/cdnsteve/i-built-a-training-simulator-for-the-dev-skills-nobody-teaches-9jo</link>
      <guid>https://forem.com/cdnsteve/i-built-a-training-simulator-for-the-dev-skills-nobody-teaches-9jo</guid>
      <description>&lt;p&gt;Pop quiz. What's wrong with this code?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/login&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;password&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Login attempt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;password&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;// LINE 5&lt;/span&gt;
    &lt;span class="na"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ip&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SELECT * FROM users WHERE email = $1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

  &lt;span class="c1"&gt;// ... auth check ...&lt;/span&gt;

  &lt;span class="nx"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user_login&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;ssn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ssn_last4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// LINE 22&lt;/span&gt;
    &lt;span class="na"&gt;creditScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;credit_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;passwordHash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;password_hash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// LINE 30&lt;/span&gt;
      &lt;span class="na"&gt;ssn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ssn_last4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;creditScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;credit_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are 7 issues in there. How many did you spot? Did you catch that line 5 writes plaintext passwords to your log aggregator? That line 22 sends SSN data to a third-party analytics service, violating GDPR Article 28? That line 30 returns the password hash in the API response?&lt;/p&gt;

&lt;p&gt;This is a real scenario from &lt;a href="https://learningto.co" rel="noopener noreferrer"&gt;LearningTo.co&lt;/a&gt; - a training platform I built for the dev skills that CS programs don't teach.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;I've been hiring and mentoring developers for years and I keep seeing the same pattern. Junior devs come in knowing algorithms and data structures but have never:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reviewed a pull request and caught a real bug&lt;/li&gt;
&lt;li&gt;Identified PII leaking through logs or API responses&lt;/li&gt;
&lt;li&gt;Recovered from a force-push that wiped a teammate's work&lt;/li&gt;
&lt;li&gt;Looked at AI-generated code and said "this looks right but it's wrong"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't edge cases. This is Tuesday on a real engineering team. And almost nobody teaches them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://learningto.co" rel="noopener noreferrer"&gt;LearningTo.co&lt;/a&gt; is scenario-based training. You get dropped into a realistic situation - a PR review, a failing pipeline, a suspicious config - and you have to investigate, flag issues, and explain your reasoning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvsh84vy3xiw224yjxnyn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvsh84vy3xiw224yjxnyn.png" alt="Scenario briefing screen showing a fintech login endpoint with evaluation criteria" width="800" height="758"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each scenario gives you context like a real team would: "You've just joined a fintech startup. Your tech lead drops a PR review in your queue." Then you're evaluated on three things: what you find, how you explain it, and whether you can prioritize severity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Here's what gameplay looks like
&lt;/h2&gt;

&lt;p&gt;You see the code. A timer's running. You click lines to flag them. Then you write your analysis explaining what's wrong and why it matters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feekeva84ui7zh904ja9e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feekeva84ui7zh904ja9e.png" alt="Code review gameplay - reviewing a login endpoint with syntax highlighting and analysis panel" width="800" height="758"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The key: &lt;strong&gt;reasoning quality is scored, not just finding the line numbers.&lt;/strong&gt; Saying "line 6 has a bug" gets you nothing. Saying "line 6 logs the plaintext password to the log aggregator, which means anyone with log access can see user credentials" gets you points.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9bvnri0zjc8nibbp8wn4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9bvnri0zjc8nibbp8wn4.png" alt="Flagged lines highlighted in the code with 3 issues found" width="800" height="758"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After you submit, you get a full debrief: what you found, what you missed, and why each issue matters in production. A score, a grade, and a ship/block verdict.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Facjcdl84r12dfegt225v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Facjcdl84r12dfegt225v.png" alt="Debrief screen showing 85/100 score, B grade, BLOCK verdict, found and missed issues with explanations" width="800" height="758"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI Code Review course (free, 8 chapters)
&lt;/h2&gt;

&lt;p&gt;This is the part I'm most proud of. GitHub teaches you how to use Copilot. Nobody teaches you how to review what it produces.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4c0a7386ee9jntiu6n7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4c0a7386ee9jntiu6n7.png" alt="AI Code Review course showing 8 chapters from Copilot CRUD to CI Pipeline" width="800" height="758"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Eight hands-on chapters, each a real scenario:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Copilot's First CRUD&lt;/strong&gt; - it compiles, it runs, but should it ship? (SQL injection, no auth, swallowed errors)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Hallucinated Import&lt;/strong&gt; - half the imports point to packages that don't exist on npm&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The 100% Coverage Lie&lt;/strong&gt; - AI-generated tests show perfect coverage but test nothing meaningful&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Confident Refactor&lt;/strong&gt; - AI "improved readability" but silently removed business rules and compliance checks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit the Agent&lt;/strong&gt; - an AI support agent makes tool calls to external APIs. What could go wrong?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Server Lockdown&lt;/strong&gt; - an MCP server gives an AI assistant access to your filesystem and database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Destructive Migration&lt;/strong&gt; - AI-generated schema migration. Your DBA is on vacation. You're the last line of defense.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI Pipeline from Copilot&lt;/strong&gt; - it builds, tests, and deploys. Find the supply chain and injection risks.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw21rllj9xsv8tf7uahtp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw21rllj9xsv8tf7uahtp.png" alt="AI Code Review gameplay showing a React dashboard with hallucinated imports" width="800" height="758"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That screenshot above is Chapter 2 - "The Hallucinated Import." Can you spot which of those imports point to real npm packages and which were completely fabricated by Copilot? (&lt;code&gt;@react-toolkit/data-grid&lt;/code&gt; doesn't exist. Neither does &lt;code&gt;use-smart-fetch&lt;/code&gt;. And &lt;code&gt;next/analytics&lt;/code&gt; is not a thing.)&lt;/p&gt;

&lt;p&gt;The entire course is free. No paywall, no credit card.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's live
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;4 categories, 17 scenarios:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Scenarios&lt;/th&gt;
&lt;th&gt;What you practice&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;PII &amp;amp; Security&lt;/strong&gt; (&lt;code&gt;/ntrol&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;API key exposure, login endpoint bugs, webhook handling, user profile data leaks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Git &amp;amp; Gitflow&lt;/strong&gt; (&lt;code&gt;/mmit&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Merge conflicts, force-push recovery, conventional commits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Code Reasoning&lt;/strong&gt; (&lt;code&gt;/de&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Auditing AI-generated services, spotting subtle bugs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;AI Code Review&lt;/strong&gt; (&lt;code&gt;/view&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;The full course above&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkx9j3dr4fiwyxavqu83.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkx9j3dr4fiwyxavqu83.png" alt="Category page showing PII &amp;amp; Security scenarios with difficulty levels and points" width="800" height="758"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The URL taxonomy is a nerdy Easter egg - every slug completes the phrase "learning to..." (learning to &lt;code&gt;/mmit&lt;/code&gt;, learning to &lt;code&gt;/ntrol&lt;/code&gt;, learning to &lt;code&gt;/de&lt;/code&gt;, learning to &lt;code&gt;/view&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;7 more categories are coming: Observability, Debugging, Code Review, CI/CD, Performance, Incident Ownership, and Testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who it's for
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CS students&lt;/strong&gt; about to start their first internship or job&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bootcamp grads&lt;/strong&gt; who can build features but haven't done team-based dev work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Junior devs&lt;/strong&gt; who want to level up on the skills that get noticed in code reviews&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Senior devs&lt;/strong&gt; - try the AI Code Review course. I guarantee Copilot has snuck something past you that you didn't catch.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Head to &lt;a href="https://learningto.co" rel="noopener noreferrer"&gt;learningto.co&lt;/a&gt; and start with &lt;strong&gt;The Login Endpoint&lt;/strong&gt; - it has 7 PII issues hidden in a fintech auth route. See how many you can find in 10 minutes.&lt;/p&gt;

&lt;p&gt;Or jump straight to the &lt;a href="https://learningto.co/view" rel="noopener noreferrer"&gt;AI Code Review course&lt;/a&gt; if you want to see how good you really are at catching Copilot's mistakes.&lt;/p&gt;

&lt;p&gt;Everything live is free. Sign up takes 10 seconds with GitHub or Google.&lt;/p&gt;




&lt;p&gt;I'd love to hear: what dev skills do you wish someone had taught you before your first job? Drop a comment - it'll probably become a scenario.&lt;/p&gt;

</description>
      <category>career</category>
      <category>beginners</category>
      <category>webdev</category>
      <category>security</category>
    </item>
    <item>
      <title>Setting Up CocoIndex with Docker and pgvector - A Practical Guide</title>
      <dc:creator>Steven Leggett</dc:creator>
      <pubDate>Tue, 17 Mar 2026 18:20:44 +0000</pubDate>
      <link>https://forem.com/cdnsteve/setting-up-cocoindex-with-docker-and-pgvector-a-practical-guide-3mag</link>
      <guid>https://forem.com/cdnsteve/setting-up-cocoindex-with-docker-and-pgvector-a-practical-guide-3mag</guid>
      <description>&lt;h1&gt;
  
  
  Setting Up CocoIndex with Docker and pgvector - A Practical Guide
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://cocoindex.io/" rel="noopener noreferrer"&gt;CocoIndex&lt;/a&gt; is a data transformation framework for AI that handles indexing with incremental processing. It uses a Rust engine with Python bindings, which means it's fast, but the setup has a few gotchas that aren't obvious from the docs. The project is &lt;a href="https://github.com/cocoindex-io/cocoindex" rel="noopener noreferrer"&gt;open source on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I spent an afternoon getting it running locally and hit every sharp edge so you don't have to. Here's what actually works.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Build
&lt;/h2&gt;

&lt;p&gt;A pipeline that reads markdown files, chunks them, generates vector embeddings using sentence-transformers, and stores them in PostgreSQL with pgvector for semantic similarity search.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.11 to 3.13 (officially supported - 3.14 works but isn't listed yet)&lt;/li&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;li&gt;About 10 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: PostgreSQL with pgvector (not plain Postgres)
&lt;/h2&gt;

&lt;p&gt;This is the first thing that will bite you. CocoIndex requires the &lt;code&gt;vector&lt;/code&gt; extension for HNSW indexes. Plain &lt;code&gt;postgres:16&lt;/code&gt; or &lt;code&gt;postgres:17&lt;/code&gt; will fail with &lt;code&gt;extension "vector" is not available&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;CocoIndex provides a docker compose config you can use directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose &lt;span class="nt"&gt;-f&lt;/span&gt; &amp;lt;&lt;span class="o"&gt;(&lt;/span&gt;curl &lt;span class="nt"&gt;-L&lt;/span&gt; https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/postgres.yaml&lt;span class="o"&gt;)&lt;/span&gt; up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or run the container manually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; cocoindex-postgres &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cocoindex &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cocoindex &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cocoindex &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 5432:5432 &lt;span class="se"&gt;\&lt;/span&gt;
  pgvector/pgvector:pg17
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If port 5432 is already in use, pick a different host port (e.g., &lt;code&gt;-p 5450:5432&lt;/code&gt;) and adjust your connection string.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Port tip:&lt;/strong&gt; Before picking a port, check nothing else is listening there. SSH tunnels can silently bind to the same port as Docker, causing misleading "password authentication failed" errors even when your credentials are correct. Verify with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lsof &lt;span class="nt"&gt;-i&lt;/span&gt; :5432
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should only see Docker's &lt;code&gt;com.docker&lt;/code&gt; process, not SSH or anything else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Python Environment
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;cocoindex-quickstart &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;cocoindex-quickstart
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; &lt;span class="s1"&gt;'cocoindex[embeddings]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;[embeddings]&lt;/code&gt; extra pulls in sentence-transformers and torch. It's a big download but gives you local embeddings with no API key needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Configure the Database Connection
&lt;/h2&gt;

&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file in your project root. CocoIndex reads it automatically via python-dotenv:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'COCOINDEX_DATABASE_URL=postgresql://cocoindex:cocoindex@localhost:5432/cocoindex'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adjust the port if you mapped to something other than 5432 in Step 1.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Write the Pipeline
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;main.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;

&lt;span class="nd"&gt;@cocoindex.flow_def&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TextEmbedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;text_embedding_flow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flow_builder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FlowBuilder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataScope&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flow_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LocalFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown_files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;doc_embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_collector&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;data_scope&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;row&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SplitRecursively&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;row&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;functions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SentenceTransformerEmbed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentence-transformers/all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;doc_embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;doc_embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;export&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;storages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Postgres&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;primary_key_fields&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;vector_indexes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;VectorIndexDef&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;field_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cocoindex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;VectorSimilarityMetric&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COSINE_SIMILARITY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Add Some Content
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;markdown_files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop some markdown files in there. For testing, even a couple of files work. The pipeline will chunk them, embed each chunk, and store the vectors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Run the Indexer
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cocoindex update main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It will show you the tables it needs to create and ask for confirmation. Type &lt;code&gt;yes&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You'll see it load the sentence-transformers model (first run downloads it from HuggingFace), create the pgvector extension, build the HNSW index, and process your files. Output looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TextEmbedding.documents (batch update): 2/2 source rows: 2 added
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 7: Query with Semantic Search
&lt;/h2&gt;

&lt;p&gt;Install psycopg2 and create a simple query script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;psycopg2-binary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# query.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentenceTransformer&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;

&lt;span class="n"&gt;DB_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql://cocoindex:cocoindex@localhost:5432/cocoindex&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentenceTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentence-transformers/all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;vec_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DB_URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cur&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        SELECT filename, left(text, 200),
               1 - (embedding &amp;lt;=&amp;gt; %s::vector) as similarity
        FROM textembedding__doc_embeddings
        ORDER BY embedding &amp;lt;=&amp;gt; %s::vector
        LIMIT %s
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vec_str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:])&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is incremental processing?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Query: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python query.py &lt;span class="s2"&gt;"which embedding models are popular?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Things That Tripped Me Up
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use &lt;code&gt;pgvector/pgvector&lt;/code&gt;, not &lt;code&gt;postgres&lt;/code&gt;.&lt;/strong&gt; This is the number one issue. The plain Postgres Docker image doesn't include the vector extension. You need &lt;code&gt;pgvector/pgvector:pg17&lt;/code&gt; (or pg16). CocoIndex will fail at table creation without it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table naming is lowercase.&lt;/strong&gt; Your flow is named &lt;code&gt;TextEmbedding&lt;/code&gt; but the table is &lt;code&gt;textembedding__doc_embeddings&lt;/code&gt;. CocoIndex lowercases the flow name. Keep this in mind when writing direct SQL queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The old &lt;code&gt;main_fn&lt;/code&gt; API is gone.&lt;/strong&gt; If you see examples using &lt;code&gt;cocoindex.main_fn()&lt;/code&gt;, that's outdated. The current API (v0.3.36+) uses the &lt;code&gt;cocoindex&lt;/code&gt; CLI command directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker volume persistence.&lt;/strong&gt; If you change Postgres env vars (user/password) but reuse the container volume, the old credentials persist. Use &lt;code&gt;docker rm -v&lt;/code&gt; to remove the volume when recreating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The &lt;code&gt;.env&lt;/code&gt; file wins.&lt;/strong&gt; CocoIndex loads &lt;code&gt;.env&lt;/code&gt; automatically via python-dotenv. If you set &lt;code&gt;COCOINDEX_DATABASE_URL&lt;/code&gt; in your shell but have a different value in &lt;code&gt;.env&lt;/code&gt;, the file takes precedence. This caught me when debugging connection issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Port conflicts with SSH tunnels.&lt;/strong&gt; If you're forwarding database ports over SSH (common with remote dev setups), an SSH tunnel can bind to the same port as Docker. The connection goes to the wrong Postgres, and you get auth failures that look like a password problem. Always verify the port with &lt;code&gt;lsof&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using CocoIndex with Claude Code
&lt;/h2&gt;

&lt;p&gt;If you're using Claude Code, there are a couple of integrations worth knowing about.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code Skill
&lt;/h3&gt;

&lt;p&gt;CocoIndex provides an official Claude Code skill that gives Claude built-in knowledge about CocoIndex's API, so it can help you write pipelines, create custom functions, and run CLI commands correctly. This would have saved me from hitting the deprecated &lt;code&gt;main_fn&lt;/code&gt; API issue.&lt;/p&gt;

&lt;p&gt;Install it from within Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add cocoindex-io/cocoindex-claude
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;cocoindex-skills@cocoindex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once installed, Claude Code understands CocoIndex's current API and can generate correct pipeline code without relying on outdated examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP for Code Search
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;cocoindex-code&lt;/code&gt;&lt;/strong&gt; is a lightweight MCP server for semantic code search:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pipx &lt;span class="nb"&gt;install &lt;/span&gt;cocoindex-code
claude mcp add cocoindex-code &lt;span class="nt"&gt;--&lt;/span&gt; ccc mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It uses SQLite locally and runs its own embeddings - no Postgres required. It's a separate tool from the main CocoIndex library, focused on searching your codebase.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP for Postgres-backed Indexes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;There is no official MCP server for the Postgres-backed pipeline&lt;/strong&gt; we built in this guide. The main &lt;code&gt;cocoindex&lt;/code&gt; library has a built-in HTTP server (&lt;code&gt;cocoindex server main.py&lt;/code&gt;) that exposes REST APIs, but it uses a proprietary protocol for their CocoInsight UI, not the MCP standard.&lt;/p&gt;

&lt;p&gt;If you need MCP access to your pgvector index, you'd need to write a thin wrapper. The &lt;code&gt;query.py&lt;/code&gt; script above is essentially all the logic you need - wrap it in an MCP server and you're there. That's a good project for a follow-up post.&lt;/p&gt;




&lt;p&gt;The full pipeline takes about 10 minutes to set up once you know the gotchas. The incremental processing means subsequent runs only reprocess changed files, which is where CocoIndex really shines over rebuilding indexes from scratch.&lt;/p&gt;

</description>
      <category>python</category>
      <category>docker</category>
      <category>postgres</category>
      <category>ai</category>
    </item>
    <item>
      <title>5 Production Incidents Every DevOps Engineer Should Know How to Debug</title>
      <dc:creator>Steven Leggett</dc:creator>
      <pubDate>Fri, 13 Mar 2026 02:53:21 +0000</pubDate>
      <link>https://forem.com/cdnsteve/5-production-incidents-every-devops-engineer-should-know-how-to-debug-1np2</link>
      <guid>https://forem.com/cdnsteve/5-production-incidents-every-devops-engineer-should-know-how-to-debug-1np2</guid>
      <description>&lt;p&gt;It's 2 AM. Your phone is screaming. The dashboard is red. Users are tweeting.&lt;/p&gt;

&lt;p&gt;You have been on call long enough to know that the gap between "I think I know what's wrong" and "I know exactly what's wrong" can cost your company thousands of dollars per minute. The engineers who close that gap fast are not smarter than everyone else. They have just seen these patterns before.&lt;/p&gt;

&lt;p&gt;Here are 5 production incidents that every DevOps engineer will encounter at some point - what they look like, why they happen, and how to debug them.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. "No Space Left on Device"
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The story
&lt;/h3&gt;

&lt;p&gt;A developer was chasing a gnarly bug in production. To get more visibility, they temporarily cranked the application log level to &lt;code&gt;DEBUG&lt;/code&gt;. They fixed the bug, merged the PR, and completely forgot to revert the log level setting.&lt;/p&gt;

&lt;p&gt;Three weeks later, at 3 AM on a Tuesday, your monitoring fires. Every service on that host is returning 500s. The database is refusing writes. Nothing makes sense until you SSH in and run the one command that tells you everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;df&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        80G   80G     0 100% /
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full. The disk is completely full. &lt;code&gt;/var/log&lt;/code&gt; has grown to 64GB of verbose debug output nobody was watching.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it happens
&lt;/h3&gt;

&lt;p&gt;Debug logging is chatty by design. It logs every function call, every query parameter, every header. In a high-traffic service, debug logs can generate gigabytes per hour. Combine that with a missing or misconfigured log rotation policy and you have a slow-motion disaster playing out in the background while everyone is focused on feature work.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to debug it
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: Confirm the problem&lt;/span&gt;
&lt;span class="nb"&gt;df&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt;

&lt;span class="c"&gt;# Step 2: Find the culprit&lt;/span&gt;
&lt;span class="nb"&gt;du&lt;/span&gt; &lt;span class="nt"&gt;-sh&lt;/span&gt; /var/log/&lt;span class="k"&gt;*&lt;/span&gt;
&lt;span class="nb"&gt;du&lt;/span&gt; &lt;span class="nt"&gt;-sh&lt;/span&gt; /var/log/nginx/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="c"&gt;# Step 3: Immediate relief - clear old compressed logs&lt;/span&gt;
find /var/log &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"*.gz"&lt;/span&gt; &lt;span class="nt"&gt;-mtime&lt;/span&gt; +7 &lt;span class="nt"&gt;-delete&lt;/span&gt;

&lt;span class="c"&gt;# Step 4: Truncate (don't delete) the active log file&lt;/span&gt;
&lt;span class="nb"&gt;truncate&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; 0 /var/log/myapp/app.log

&lt;span class="c"&gt;# Step 5: Check logrotate config&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; /etc/logrotate.d/myapp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;p&gt;Set the log level back to &lt;code&gt;INFO&lt;/code&gt; or &lt;code&gt;WARN&lt;/code&gt;. Fix your logrotate config to enforce retention limits. Then add a disk space alert at 80% - not 95%. By the time you hit 95%, you probably have minutes, not hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key takeaway
&lt;/h3&gt;

&lt;p&gt;Your logging infrastructure needs to be monitored too. The tool you use to diagnose outages can itself cause outages if you ignore it.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Database Connection Pool Exhaustion
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The story
&lt;/h3&gt;

&lt;p&gt;Traffic is normal. CPU is normal. The database server is completely idle. But your application is throwing errors that look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: Cannot acquire connection from pool
TimeoutError: timeout of 5000ms exceeded
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And users are getting 503s.&lt;/p&gt;

&lt;p&gt;This one is maddening the first time you see it because every instinct tells you to look at the database. The database is fine. The problem is how your application is managing its connection to the database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it happens
&lt;/h3&gt;

&lt;p&gt;Most database drivers give you a connection pool - a fixed set of reusable connections shared across your application's threads or async workers. When a request needs to run a query, it borrows a connection from the pool. When it's done, it returns the connection.&lt;/p&gt;

&lt;p&gt;The failure mode that is easy to miss: what happens when a request throws an exception before it returns the connection?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This is a leak&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;acquire&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SELECT * FROM users WHERE id = $1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="c1"&gt;// If the query throws, conn is never released&lt;/span&gt;
  &lt;span class="nx"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;release&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// This is correct&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;acquire&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SELECT * FROM users WHERE id = $1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;release&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Always runs, even on error&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under normal traffic, the leak is slow enough that the pool replenishes. Under higher load, or when errors spike, the pool drains faster than it fills. Then everything queues up waiting for a connection that never comes back.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to debug it
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- On PostgreSQL: see all active connections&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_activity&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- See who is holding connections longest&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;pg_stat_activity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query_start&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_activity&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'idle'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look for connections stuck in &lt;code&gt;idle in transaction&lt;/code&gt; state. That is almost always a leak. The connection was borrowed, a transaction started, and it was never committed or rolled back.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;p&gt;Audit your error handling paths. Every connection acquire must have a matching release in a &lt;code&gt;finally&lt;/code&gt; block or equivalent. Set a &lt;code&gt;connectionTimeoutMillis&lt;/code&gt; on your pool so leaked connections get reclaimed automatically. Add an alert when active connection count exceeds 80% of pool size.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key takeaway
&lt;/h3&gt;

&lt;p&gt;Low database CPU during an "outage" is a red flag pointing to connection management, not query performance. Always check &lt;code&gt;pg_stat_activity&lt;/code&gt; before assuming the database is healthy.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Kubernetes CrashLoopBackOff - The Missing Secret
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The story
&lt;/h3&gt;

&lt;p&gt;You deploy a new version of your application to Kubernetes. Instead of the pods coming up healthy, you see this in &lt;code&gt;kubectl get pods&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;NAME                          READY   STATUS             RESTARTS   AGE
myapp-7d9f8b-xkj2p            0/1     CrashLoopBackOff   4          3m
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pod starts, crashes almost immediately, Kubernetes restarts it, it crashes again, and the backoff timer grows exponentially. Within 10 minutes the pod is waiting 5 minutes between restart attempts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it happens
&lt;/h3&gt;

&lt;p&gt;This specific variant is one of the more frustrating ones: the app crashes on startup because it cannot find a required configuration value. It is looking for a secret - maybe a database password, maybe an API key - via an environment variable mounted from a Kubernetes Secret.&lt;/p&gt;

&lt;p&gt;But the Secret does not exist in this namespace.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl logs myapp-7d9f8b-xkj2p
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: Required environment variable DATABASE_PASSWORD is not set
Process exited with code 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl describe pod myapp-7d9f8b-xkj2p
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Events:
  Warning  Failed    2m   kubelet  Error: secret "myapp-credentials" not found
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The deployment referenced a Secret that was never created in the target namespace. It exists in &lt;code&gt;staging&lt;/code&gt;. It does not exist in &lt;code&gt;production&lt;/code&gt;. The deployment YAML was copy-pasted and nobody noticed.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to debug it
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: Get the actual error&lt;/span&gt;
kubectl logs &amp;lt;pod-name&amp;gt;
kubectl logs &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;--previous&lt;/span&gt;  &lt;span class="c"&gt;# Logs from the crashed instance&lt;/span&gt;

&lt;span class="c"&gt;# Step 2: Describe the pod for Kubernetes-level events&lt;/span&gt;
kubectl describe pod &amp;lt;pod-name&amp;gt;

&lt;span class="c"&gt;# Step 3: Check if the secret exists&lt;/span&gt;
kubectl get secrets &lt;span class="nt"&gt;-n&lt;/span&gt; &amp;lt;namespace&amp;gt;

&lt;span class="c"&gt;# Step 4: Verify the secret has the expected keys&lt;/span&gt;
kubectl describe secret myapp-credentials
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;p&gt;Create the missing Secret in the correct namespace. For the longer-term fix, use a tool like &lt;code&gt;helm diff&lt;/code&gt;, &lt;code&gt;kubectl diff&lt;/code&gt;, or a GitOps pipeline that validates all referenced resources exist before allowing a deployment to proceed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key takeaway
&lt;/h3&gt;

&lt;p&gt;CrashLoopBackOff means the pod keeps dying. &lt;code&gt;kubectl logs --previous&lt;/code&gt; shows why it died. &lt;code&gt;kubectl describe pod&lt;/code&gt; shows what Kubernetes tried and failed to do. Always check both.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. The Node.js Memory Leak (WebSocket Edition)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The story
&lt;/h3&gt;

&lt;p&gt;Your Node.js service is running fine after deployment. Memory usage is at 200MB, which is normal. Over the next 18 hours, you watch it creep up. 300MB. 400MB. 600MB. Then the process gets OOMKilled by the container runtime and restarts. The whole cycle starts again.&lt;/p&gt;

&lt;p&gt;You check your code for obvious leaks - giant arrays, global caches growing unbounded. Nothing jumps out. This one hides.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it happens
&lt;/h3&gt;

&lt;p&gt;The classic Node.js memory leak pattern that catches even experienced engineers: adding event listeners inside a function that gets called repeatedly, without removing them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This leaks memory on every new WebSocket connection&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;setupWebSocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// This listener is added fresh on every call&lt;/span&gt;
  &lt;span class="c1"&gt;// But the reference to process.on keeps the socket alive&lt;/span&gt;
  &lt;span class="c1"&gt;// even after the connection closes&lt;/span&gt;
  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SIGTERM&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nx"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;message&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;handleMessage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every time a new WebSocket connection comes in, a new listener is added to &lt;code&gt;process&lt;/code&gt;. When the connection closes, the listener is not removed. The listener holds a reference to the &lt;code&gt;socket&lt;/code&gt; object. The &lt;code&gt;socket&lt;/code&gt; object cannot be garbage collected. After thousands of connections, you have thousands of dead socket references sitting in memory.&lt;/p&gt;

&lt;p&gt;Node.js will even warn you about this - but the warning often gets lost in log noise:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MaxListenersExceededWarning: Possible EventEmitter memory leak detected.
11 SIGTERM listeners added to [process]. Use emitter.setMaxListeners() to increase limit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That warning is not something to suppress. It is a canary telling you there is a leak.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to debug it
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get a heap snapshot from a running Node.js process&lt;/span&gt;
&lt;span class="nb"&gt;kill&lt;/span&gt; &lt;span class="nt"&gt;-USR2&lt;/span&gt; &amp;lt;pid&amp;gt;

&lt;span class="c"&gt;# Or via the Node.js inspector&lt;/span&gt;
node &lt;span class="nt"&gt;--inspect&lt;/span&gt; app.js
&lt;span class="c"&gt;# Then open chrome://inspect and take a heap snapshot&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look for object types with counts growing over time. In this case you would see &lt;code&gt;Socket&lt;/code&gt; instances accumulating far beyond the number of active connections.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;clinic.js&lt;/code&gt; tool is excellent for this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx clinic heapprofiler &lt;span class="nt"&gt;--&lt;/span&gt; node app.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;p&gt;Always clean up listeners when the associated resource goes away:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;setupWebSocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cleanup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SIGTERM&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cleanup&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;close&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Remove the listener when the connection closes&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;removeListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SIGTERM&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cleanup&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key takeaway
&lt;/h3&gt;

&lt;p&gt;Memory leaks in Node.js are almost always about retaining references longer than necessary. Event listeners are the most common culprit. Take &lt;code&gt;MaxListenersExceededWarning&lt;/code&gt; seriously - it is not noise.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Cache Stampede After Redis Restart
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The story
&lt;/h3&gt;

&lt;p&gt;Your Redis cache went down for planned maintenance. You brought it back up. Simple, right?&lt;/p&gt;

&lt;p&gt;Sixty seconds later your database server is on fire. CPU is pegged at 100%. Query latency went from 5ms to 8 seconds. The database is drowning.&lt;/p&gt;

&lt;p&gt;What happened? Every single cache key expired at the same moment - because they all had the same TTL set from the last cache warming cycle - and every single application server tried to rebuild the cache simultaneously by hitting the database.&lt;/p&gt;

&lt;p&gt;This is a cache stampede, also called a thundering herd.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it happens
&lt;/h3&gt;

&lt;p&gt;Consider what happens when your cache is empty after a restart and you have 50 application servers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Request comes in for &lt;code&gt;/api/products&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;All 50 servers check the cache - cache miss&lt;/li&gt;
&lt;li&gt;All 50 servers query the database for product data&lt;/li&gt;
&lt;li&gt;All 50 servers write the result back to cache&lt;/li&gt;
&lt;li&gt;49 of those database queries were wasted&lt;/li&gt;
&lt;li&gt;Under high traffic, "50 servers" becomes "50,000 requests per second"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The database - which normally handles 200 queries per second because the cache absorbs the rest - suddenly receives 20,000 queries per second. It collapses.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to debug it
&lt;/h3&gt;

&lt;p&gt;The diagnosis is usually visible in the metrics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cache hit rate: dropped from 95% to 0%
Database connections: spiked from 50 to 800 in 30 seconds
Database CPU: 100%
API P99 latency: 50ms -&amp;gt; 12,000ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Correlate the timeline with the Redis restart event. If the metrics cliff happened right when Redis came back up, you have your answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;p&gt;Several strategies exist, and production systems often use multiple in combination:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache locking (mutex pattern):&lt;/strong&gt; Only one process populates a cache key. Others wait.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_with_lock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fetch_fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;

    &lt;span class="c1"&gt;# Try to acquire a lock
&lt;/span&gt;    &lt;span class="n"&gt;lock_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lock:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lock_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# We got the lock - populate the cache
&lt;/span&gt;        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_fn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
        &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lock_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Someone else has the lock - wait briefly and retry
&lt;/span&gt;        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TTL jitter:&lt;/strong&gt; Add random variance to cache expiration times so keys do not all expire simultaneously.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="n"&gt;base_ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
&lt;span class="n"&gt;jitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;base_ttl&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;jitter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Probabilistic early expiration:&lt;/strong&gt; Proactively refresh cache entries before they expire, based on how expensive the recomputation is.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key takeaway
&lt;/h3&gt;

&lt;p&gt;A cache restart is not a safe non-event. Treat cache warming as part of your maintenance procedure. Use TTL jitter by default - it costs nothing and prevents a whole class of stampede failures.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pattern Across All Five
&lt;/h2&gt;

&lt;p&gt;Look at what these incidents have in common:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The symptoms lied.&lt;/strong&gt; Disk full causing database errors. Connection pool causing "database" problems. A cache issue causing what looks like a database overload.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The actual cause was one layer removed&lt;/strong&gt; from where the pain was visible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All five are preventable&lt;/strong&gt; with the right monitoring thresholds, code patterns, and configuration choices.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;All five are faster to debug if you have seen them before.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point is the crux of it. Incident response speed is largely pattern recognition. The engineer who has seen a connection pool exhaustion before spots the idle database CPU and goes straight to &lt;code&gt;pg_stat_activity&lt;/code&gt;. The one who has not seen it spends an hour tuning query indexes that are not the problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practice Before the Pager Goes Off
&lt;/h2&gt;

&lt;p&gt;If you want to build that pattern recognition without waiting for production to teach you the hard way, I built &lt;a href="https://youbrokeprod.com" rel="noopener noreferrer"&gt;youbrokeprod.com&lt;/a&gt; - a free browser game where you investigate production outages step by step.&lt;/p&gt;

&lt;p&gt;Each scenario drops you into a live incident: you run commands, read logs, check metrics, and work toward a diagnosis. No signup required to try it. The game currently has 10 scenarios across beginner, intermediate, and advanced difficulty - including all five incidents described in this post.&lt;/p&gt;

&lt;p&gt;The goal is simple: make the muscle memory of incident debugging feel familiar before it is your on-call rotation on the line.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What production incidents have scarred you the most? Drop them in the comments - there are 44 scenarios in the backlog and the most painful real-world ones make the best levels.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>sre</category>
      <category>incidentresponse</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
