<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem</title>
    <description>The most recent home feed on Forem.</description>
    <link>https://forem.com</link>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed"/>
    <language>en</language>
    <item>
      <title>I Built Three Platforms for Dental Professionals — Here's What the Data Taught Me</title>
      <dc:creator>Nick Grigora</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:56:44 +0000</pubDate>
      <link>https://forem.com/nickgrigora/i-built-three-platforms-for-dental-professionals-heres-what-the-data-taught-me-3nbl</link>
      <guid>https://forem.com/nickgrigora/i-built-three-platforms-for-dental-professionals-heres-what-the-data-taught-me-3nbl</guid>
      <description>&lt;p&gt;I'm a software engineer who noticed that discovery in dentistry is completely broken. Dental professionals at small practices never see product reps. They pick labs based on word of mouth. Software vendors spam everyone with cold emails but there's no independent place to compare what exists.&lt;/p&gt;

&lt;p&gt;So I built three things to fix it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;FreeDentalSamples.com&lt;/strong&gt; — dental professionals verify their license once and request free samples from brands. Brands get full lead data and fulfill through their existing Shopify store. I'm just the verified routing layer — no inventory, no logistics.&lt;/p&gt;

&lt;p&gt;5 months in: 400+ verified professionals, 52k samples distributed, ~120k patients reached. Growing ~6/day with zero paid acquisition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BestDentalLabs.org&lt;/strong&gt; — searchable directory of 100+ dental labs. Filter by specialty, certification, turnaround time, location. SEO-first: static page per lab at &lt;code&gt;/state/city/lab-name&lt;/code&gt;. Within weeks of launch the VP of AI Engineering at Glidewell Dental — the largest dental lab in the US — found it through search and reached out. Zero outreach on my end.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BestDentalSoftware.org&lt;/strong&gt; — just launched. Dental software directory across 9 categories. The trigger: I listed one app on FDS as a free trial with zero promotion and 9 professionals clicked through in 48 hours. Same audience, different discovery problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  The One Thing That Mattered
&lt;/h2&gt;

&lt;p&gt;Don't build transactions, fulfillment, or logistics. Route to existing infrastructure. That constraint kept me shipping.&lt;/p&gt;

&lt;p&gt;Verification is the moat. Every brand conversation changed the moment I said "verified dental professionals." Not dental professionals. Verified. The license check that felt like friction is the thing that makes the leads worth paying for.&lt;/p&gt;




&lt;p&gt;Happy to answer questions about architecture, SEO strategy, or the business model in the comments.&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>buildinpublic</category>
      <category>webdev</category>
      <category>startup</category>
    </item>
    <item>
      <title>🧠 AI is not replacing developers — it’s exposing the gap between them</title>
      <dc:creator>Francisco Molina</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:55:05 +0000</pubDate>
      <link>https://forem.com/frxncisxo/ai-is-not-replacing-developers-its-exposing-the-gap-between-them-3k3n</link>
      <guid>https://forem.com/frxncisxo/ai-is-not-replacing-developers-its-exposing-the-gap-between-them-3k3n</guid>
      <description>&lt;p&gt;The discussion around AI in software engineering is often polarized:&lt;/p&gt;

&lt;p&gt;AI will replace developers&lt;br&gt;
AI is just another tool&lt;/p&gt;

&lt;p&gt;Neither framing is particularly useful.&lt;/p&gt;

&lt;p&gt;What is actually happening is more structural:&lt;/p&gt;

&lt;p&gt;AI is raising the baseline of software development.&lt;/p&gt;

&lt;h2&gt;
  
  
  The baseline is no longer a differentiator
&lt;/h2&gt;

&lt;p&gt;Activities that historically differentiated developers are becoming increasingly commoditized:&lt;/p&gt;

&lt;p&gt;Writing syntactically correct, functional code&lt;br&gt;
Integrating APIs and SDKs&lt;br&gt;
Implementing standard patterns in popular frameworks&lt;/p&gt;

&lt;p&gt;With modern AI systems, these tasks can be generated or accelerated with relatively low effort.&lt;/p&gt;

&lt;p&gt;This does not make them irrelevant —&lt;br&gt;
but it does make them insufficient as signals of engineering capability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the real differentiation is moving
&lt;/h2&gt;

&lt;p&gt;As the baseline shifts, value concentrates in areas that require contextual and system-level thinking:&lt;/p&gt;

&lt;p&gt;Designing resilient architectures&lt;br&gt;
Evaluating trade-offs under constraints&lt;br&gt;
Understanding business-critical flows&lt;br&gt;
Diagnosing issues in non-ideal environments&lt;/p&gt;

&lt;p&gt;These are not isolated tasks. They exist within systems that evolve, degrade, and fail.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI can generate code — but it cannot own consequences
&lt;/h2&gt;

&lt;p&gt;One of the most important distinctions is this:&lt;/p&gt;

&lt;h2&gt;
  
  
  AI produces outputs, but it does not operate under consequences.
&lt;/h2&gt;

&lt;p&gt;In real systems, correctness is not defined by compilation or passing tests alone.&lt;/p&gt;

&lt;p&gt;Consider a common scenario:&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: Race condition in async flows
&lt;/h2&gt;

&lt;p&gt;You have a mobile application that triggers multiple asynchronous jobs:&lt;/p&gt;

&lt;p&gt;Sync product catalog&lt;br&gt;
Sync pricing&lt;br&gt;
Sync customer data&lt;/p&gt;

&lt;p&gt;An AI-generated approach might suggest launching all jobs concurrently and updating UI state based on completion callbacks.&lt;/p&gt;

&lt;p&gt;At a glance, this is correct.&lt;/p&gt;

&lt;p&gt;In production, however, this can lead to:&lt;/p&gt;

&lt;p&gt;UI reflecting partial or inconsistent state&lt;br&gt;
Pricing being calculated before product data is fully available&lt;br&gt;
Intermittent bugs that only appear under network latency&lt;/p&gt;

&lt;p&gt;A more robust solution requires:&lt;/p&gt;

&lt;p&gt;Understanding dependency ordering&lt;br&gt;
Coordinating async execution&lt;br&gt;
Designing a consistent state model (e.g., progressive aggregation, gating completion on full readiness)&lt;/p&gt;

&lt;p&gt;This is not a code generation problem.&lt;/p&gt;

&lt;p&gt;It is a system behavior problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: “Correct” code, wrong system behavior
&lt;/h2&gt;

&lt;p&gt;Another frequent issue:&lt;/p&gt;

&lt;p&gt;AI generates code that is locally valid but globally incorrect.&lt;/p&gt;

&lt;p&gt;For instance:&lt;/p&gt;

&lt;p&gt;Updating a pricing function without considering promotional rules&lt;br&gt;
Refactoring a method without understanding shared mutable state&lt;br&gt;
Suggesting caching without defining invalidation strategy&lt;/p&gt;

&lt;p&gt;Each individual change may appear reasonable.&lt;/p&gt;

&lt;p&gt;But in a real system:&lt;/p&gt;

&lt;p&gt;Discounts may be miscalculated&lt;br&gt;
State may become inconsistent across screens&lt;br&gt;
Cached data may drift from source of truth&lt;/p&gt;

&lt;p&gt;These are failures of context awareness, not syntax.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI does not replace seniority — it amplifies it
&lt;/h2&gt;

&lt;p&gt;There is a misconception that AI reduces the need for experience.&lt;/p&gt;

&lt;p&gt;In practice, it does the opposite.&lt;/p&gt;

&lt;p&gt;Using AI effectively requires:&lt;/p&gt;

&lt;p&gt;Precise problem definition&lt;br&gt;
Mental modeling of system behavior&lt;br&gt;
Awareness of edge cases and failure modes&lt;br&gt;
Ability to validate and reject outputs&lt;/p&gt;

&lt;p&gt;Without this, AI-generated solutions can introduce subtle and costly issues.&lt;/p&gt;

&lt;p&gt;With it, AI becomes a force multiplier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Experience is not optional — it is observable
&lt;/h2&gt;

&lt;p&gt;There are aspects of engineering that cannot be shortcut:&lt;/p&gt;

&lt;p&gt;Having seen systems fail under load&lt;br&gt;
Understanding the cost of poor architectural decisions&lt;br&gt;
Recognizing patterns that lead to instability&lt;/p&gt;

&lt;p&gt;These are not theoretical concerns.&lt;/p&gt;

&lt;p&gt;They emerge from real-world exposure to:&lt;/p&gt;

&lt;p&gt;Production incidents&lt;br&gt;
Inconsistent data states&lt;br&gt;
Long-running systems with accumulated complexity&lt;/p&gt;

&lt;p&gt;And they directly influence how solutions are evaluated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do frameworks still matter?
&lt;/h2&gt;

&lt;p&gt;A natural consequence of AI-assisted development is the question:&lt;/p&gt;

&lt;p&gt;If AI can generate code for any framework, should developers still invest in learning them?&lt;/p&gt;

&lt;p&gt;The answer is not binary.&lt;/p&gt;

&lt;p&gt;Framework knowledge is shifting from memorization to operational understanding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Developers still need to:
&lt;/h2&gt;

&lt;p&gt;Select appropriate tools for a given problem&lt;br&gt;
Understand lifecycle, constraints, and limitations&lt;br&gt;
Validate that generated code aligns with system requirements&lt;/p&gt;

&lt;p&gt;AI can reproduce patterns from a framework.&lt;/p&gt;

&lt;p&gt;It cannot determine whether that framework is appropriate for your system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real shift
&lt;/h2&gt;

&lt;p&gt;AI is not eliminating developers.&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;Reducing the value of purely mechanical skills&lt;br&gt;
Increasing the importance of engineering judgment&lt;br&gt;
Making differences in experience more visible&lt;/p&gt;

&lt;h2&gt;
  
  
  This changes how developers grow:
&lt;/h2&gt;

&lt;p&gt;Less emphasis on syntax and memorization&lt;br&gt;
More emphasis on systems thinking&lt;br&gt;
Greater focus on correctness under real-world conditions&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;AI is a powerful tool.&lt;/p&gt;

&lt;p&gt;But it operates without accountability, context, or consequences.&lt;/p&gt;

&lt;p&gt;Software systems do not.&lt;/p&gt;

&lt;p&gt;And that gap — between generating code and owning systems —&lt;br&gt;
is where engineering still matters most.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>seniority</category>
    </item>
    <item>
      <title>I Reviewed 100 LinkedIn Headlines — Here's What Actually Gets Clicks</title>
      <dc:creator>charlie-morrison</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:54:20 +0000</pubDate>
      <link>https://forem.com/charliemorrison/i-reviewed-100-linkedin-headlines-heres-what-actually-gets-clicks-3oln</link>
      <guid>https://forem.com/charliemorrison/i-reviewed-100-linkedin-headlines-heres-what-actually-gets-clicks-3oln</guid>
      <description>&lt;p&gt;Your LinkedIn headline is 220 characters. That's it. And most people waste every single one of them.&lt;/p&gt;

&lt;p&gt;I spent a weekend going through 100 LinkedIn profiles — recruiters, developers, founders, job seekers, freelancers. I wanted to know: what separates a headline that gets profile views from one that gets ignored?&lt;/p&gt;

&lt;p&gt;Here's what I found.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 3 Headlines Everyone Uses (And Why They Fail)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "Software Developer at Company X"
&lt;/h3&gt;

&lt;p&gt;This is the default. LinkedIn auto-generates it from your job title. About 60% of the profiles I reviewed used some version of this.&lt;/p&gt;

&lt;p&gt;The problem: it tells people what you ARE but not what you DO or what you're GOOD AT. A recruiter searching "React developer" won't find "Software Developer at Acme Corp" unless your profile text happens to mention React.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Passionate about [buzzword] | [buzzword] enthusiast | [buzzword] lover"
&lt;/h3&gt;

&lt;p&gt;Passion is not a skill. "Passionate about innovation" tells me nothing about your actual capabilities. Every third profile I saw had some version of this filler.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Open to work | Seeking new opportunities"
&lt;/h3&gt;

&lt;p&gt;This works as a signal, but it shouldn't be your ENTIRE headline. You're still selling yourself — "open to work" is the door, not the product.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works: 5 Patterns From Top Performers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pattern 1: The Value Statement
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Format:&lt;/strong&gt; I help [audience] do [result] through [method]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"I help SaaS companies reduce churn by 30% through data-driven UX research"&lt;/li&gt;
&lt;li&gt;"Helping early-stage startups ship their first product in 90 days"&lt;/li&gt;
&lt;li&gt;"I turn complex data into dashboards executives actually use"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; It answers the question "what's in it for me?" immediately. Recruiters and clients can self-select in 3 seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2: The Proof Stack
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Format:&lt;/strong&gt; [Title] | [Quantified achievement] | [Differentiator]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Senior Backend Engineer | 4x system throughput at scale | Go, Rust, distributed systems"&lt;/li&gt;
&lt;li&gt;"Product Designer | 200K+ users shipped | Design systems, prototyping, user research"&lt;/li&gt;
&lt;li&gt;"Content Strategist | 10M+ organic views | B2B SaaS, developer marketing"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Numbers cut through noise. "4x throughput" is memorable. "Senior Backend Engineer" is not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3: The Niche Specialist
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Format:&lt;/strong&gt; The [niche] [role] — [specific outcome]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"The compliance automation guy — making SOC 2 less painful since 2019"&lt;/li&gt;
&lt;li&gt;"Healthcare UX Designer — because doctor portals shouldn't need a manual"&lt;/li&gt;
&lt;li&gt;"DevRel for developer tools — turning docs into adoption"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Being specific makes you memorable and findable. "Healthcare UX Designer" beats "UX Designer" in search AND in memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 4: The Career Transition
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Format:&lt;/strong&gt; [Previous credibility] → [New direction] | [What you bring]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Ex-McKinsey → Building AI tools for supply chain | Operations + ML"&lt;/li&gt;
&lt;li&gt;"10Y Finance → Product Management | Where spreadsheets meet user stories"&lt;/li&gt;
&lt;li&gt;"Teacher turned developer | Bringing pedagogy to developer education"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; The transition IS the story. It shows range and makes people curious enough to click.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 5: The Mission Statement
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Format:&lt;/strong&gt; [What you're building/doing] | [Why it matters]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Building the future of code review at [Company] | Ship faster, break less"&lt;/li&gt;
&lt;li&gt;"Making AI accessible to non-technical teams | No PhD required"&lt;/li&gt;
&lt;li&gt;"Automating the boring parts of freelancing so you can do the actual work"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; It positions you as someone with direction, not just a job holder.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data: What Correlated With Higher Profile Views
&lt;/h2&gt;

&lt;p&gt;From my sample of 100 profiles (comparing headline style to reported profile view counts where visible):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Profiles with &lt;strong&gt;specific numbers&lt;/strong&gt; in headlines had ~3x more reported views&lt;/li&gt;
&lt;li&gt;Profiles mentioning a &lt;strong&gt;niche&lt;/strong&gt; (not just a broad title) had ~2x more views&lt;/li&gt;
&lt;li&gt;Headlines &lt;strong&gt;under 120 characters&lt;/strong&gt; performed better than those using all 220&lt;/li&gt;
&lt;li&gt;Using &lt;strong&gt;"|" as a separator&lt;/strong&gt; was more common in high-view profiles than commas or dashes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;First person&lt;/strong&gt; ("I help...") outperformed third person ("Helps companies...")&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Fixes You Can Make Right Now
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add one number.&lt;/strong&gt; Years of experience, users served, revenue generated, anything quantifiable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Name your niche.&lt;/strong&gt; Not "developer" — "React Native developer for fintech." Not "designer" — "UX designer for healthcare."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remove "passionate."&lt;/strong&gt; Replace with a concrete skill or result.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Front-load keywords.&lt;/strong&gt; LinkedIn search weighs the beginning of your headline more heavily. Put your most searchable terms first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test it.&lt;/strong&gt; Change your headline, wait 2 weeks, check if profile views went up. LinkedIn shows you this data.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Generate Yours in 30 Seconds
&lt;/h2&gt;

&lt;p&gt;I built a free &lt;a href="https://charliemorrison.dev/linkedin-headline" rel="noopener noreferrer"&gt;LinkedIn Headline Generator&lt;/a&gt; that creates 30 headline variations based on your role, skills, and preferred tone. Pick one, tweak it, paste it in.&lt;/p&gt;

&lt;p&gt;No account needed. Runs in your browser. Your data stays on your machine.&lt;/p&gt;

&lt;p&gt;The other free tools in the career suite:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://charliemorrison.dev/resume-checker" rel="noopener noreferrer"&gt;Resume ATS Score Checker&lt;/a&gt; — find out if your resume passes automated filters&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://charliemorrison.dev/cover-letter" rel="noopener noreferrer"&gt;Cover Letter Generator&lt;/a&gt; — professional cover letters in 4 tones&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://charliemorrison.dev/salary-negotiation" rel="noopener noreferrer"&gt;Salary Negotiation Script Generator&lt;/a&gt; — scripts for offers, raises, and counters&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://charliemorrison.dev/follow-up-email" rel="noopener noreferrer"&gt;Interview Follow-Up Email Generator&lt;/a&gt; — thank-you and follow-up templates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Or grab the complete &lt;a href="https://charliemorrison.lemonsqueezy.com/checkout/buy/0c3f51d6-9089-466e-ada4-58a0b22036e0" rel="noopener noreferrer"&gt;Job Search AI Toolkit&lt;/a&gt; ($12) — 100+ prompts covering everything from resume optimization to salary negotiation.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's the best LinkedIn headline you've ever seen? Drop it below — I'm collecting examples for a follow-up post.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>linkedin</category>
      <category>productivity</category>
      <category>beginners</category>
    </item>
    <item>
      <title>UTMP being phased out: - Why "who" returns empty output on modern ubuntu</title>
      <dc:creator>Jaideep Grover</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:53:58 +0000</pubDate>
      <link>https://forem.com/jaideepg/utmp-being-phased-out-why-who-returns-empty-output-on-modern-ubuntu-1chj</link>
      <guid>https://forem.com/jaideepg/utmp-being-phased-out-why-who-returns-empty-output-on-modern-ubuntu-1chj</guid>
      <description>&lt;p&gt;So, while operating on an Ubuntu 25.10 I had to use who to check the session, and something weird happened it basically didn't work, for a simple command not to work, I knew something was up with my machine so I did some digging, turns out the next alternative "w" was working just fine but not &lt;code&gt;who&lt;/code&gt;. My curiosity rose I checked out Launchpad Bug #2130814, other people encountered it as well and everyone had a different theory why the "who" not work but "w" did. So what happens is if you run "who" it would just return Exit Code: 0 and will not show you any information. &lt;/p&gt;

&lt;h2&gt;
  
  
  The bug
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;who&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt;
0

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While tracking down the reason for this behavior, I got a lot of information that I was not really aware of, the real reason is Ubuntu is moving away from a mechanism that has been working for them since 1970s. While suspected its not a uutils-vs-gnu issue either, cause kind of the same thing happens with &lt;code&gt;gnuwho&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;gnuwho
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt;
0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both broken&lt;/p&gt;

&lt;p&gt;So, while looking at the bug &lt;a href="https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2130814" rel="noopener noreferrer"&gt;https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2130814&lt;/a&gt; which is open for a while, the original reporter noticed /var/run/utmp is missing, another possible reason seen by a reporter was an EACCESS (Permission Denied) on the /run/systemd/sessions/3 so probably AppArmor was denying it.&lt;/p&gt;

&lt;p&gt;I started by checking whether session tracking itself was broken. It wasn't.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;loginctl list-sessions
SESSION  UID USER   SEAT LEADER CLASS   TTY IDLE SINCE
      4 1000 ubuntu -    1010   user    -   no   -
      5 1000 ubuntu -    1313   manager -   no   -

2 sessions listed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Session do exist. Systemd knows it, for some reason who wasn't reading it. Next I checked the AppArmor theory, the other commentator saw "EACCESS" but on a clean Ubuntu 25.10 the session files had the required readable permission&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; /run/systemd/sessions/
&lt;span class="nt"&gt;-rw-r--r--&lt;/span&gt;  1 root root 330 Apr 26 18:27 4
&lt;span class="nt"&gt;-rw-r--r--&lt;/span&gt;  1 root root 309 Apr 26 18:27 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So files were present their and world-readable, "who" just doesn't look at it. The output of &lt;code&gt;loginctl --version&lt;/code&gt; made things click:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;loginctl &lt;span class="nt"&gt;--version&lt;/span&gt;
systemd 257 &lt;span class="o"&gt;(&lt;/span&gt;257.9-0ubuntu2.4&lt;span class="o"&gt;)&lt;/span&gt;
+PAM +AUDIT +SELINUX +APPARMOR +IMA +IPE +SMACK +SECCOMP +GCRYPT
&lt;span class="nt"&gt;-GNUTLS&lt;/span&gt; +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 &lt;span class="nt"&gt;-IDN&lt;/span&gt; +IPTC
+KMOD +LIBCRYPTSETUP +LIBCRYPTSETUP_PLUGINS +LIBFDISK +PCRE2 +PWQUALITY
+P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +BTF
&lt;span class="nt"&gt;-XKBCOMMON&lt;/span&gt; &lt;span class="nt"&gt;-UTMP&lt;/span&gt; +SYSVINIT +LIBARCHIVE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look near the end: &lt;code&gt;-UTMP&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Each of the flag shows whether systemd was built with that feature. A '+' means included a '-' means excluded. Most of them are included except UTMP. &lt;/p&gt;

&lt;p&gt;UTMP is Unix mechanism for tracking the logged in users. The files that are used by UTMP are &lt;code&gt;/var/run/utmp&lt;/code&gt; and &lt;code&gt;/var/run/utmpx&lt;/code&gt; and tools like 'who', 'last' and 'w' uses this. The modern Ubuntu 25.10 is made without this&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; /var/run/utmp /var/run/utmpx
&lt;span class="nb"&gt;ls&lt;/span&gt;: cannot access &lt;span class="s1"&gt;'/var/run/utmp'&lt;/span&gt;: No such file or directory
&lt;span class="nb"&gt;ls&lt;/span&gt;: cannot access &lt;span class="s1"&gt;'/var/run/utmpx'&lt;/span&gt;: No such file or directory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  But why
&lt;/h2&gt;

&lt;p&gt;In October 2024, Debian systemd maitainer announced that the support for UMTP will be discontinued. The reason was y2038 problem, to elaborate, UTMP works on a 32 bit system, due to limited bits it would overflow on January 19 2038, so rather than redesigning the whole system the world is pivoting to wtmpdb as the successor. wtmpdb is a SQLite-backed login database to be y2038 safe. And if that doesn't work for you, you can use &lt;code&gt;w&lt;/code&gt;. But this means while updating to new operating system or environment people might need to tweak some commands as well.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why does w work though
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;w
 18:31:03 up 17 min,  1 user,  load average: 0.00, 0.00, 0.00
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU  WHAT
ubuntu   pts/0    18.206.107.29    18:27    6.00s  0.07s   ?    w
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;w&lt;/code&gt; is part of the &lt;code&gt;procps&lt;/code&gt; package which has been updated, it only used utmp as one of the source for findings &lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons and Closing
&lt;/h2&gt;

&lt;p&gt;I posted a comment for the bug &lt;a href="https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2130814" rel="noopener noreferrer"&gt;https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/2130814&lt;/a&gt;&lt;br&gt;
I believe there will be plenty of people who would hit this problem, hopefully this helps them. &lt;/p&gt;

&lt;p&gt;The main problem here was not an error message, it was the empty response any workflow would get disturbed and updated with an error but the empty output could make problems way worse. So if you encounter problem the alternatives are &lt;code&gt;w&lt;/code&gt; or &lt;code&gt;loginctl list-sessions&lt;/code&gt; &lt;/p&gt;

</description>
      <category>ubuntu</category>
      <category>linux</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How to Negotiate Your Salary Over Email (With Templates)</title>
      <dc:creator>charlie-morrison</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:53:31 +0000</pubDate>
      <link>https://forem.com/charliemorrison/how-to-negotiate-your-salary-over-email-with-templates-5c14</link>
      <guid>https://forem.com/charliemorrison/how-to-negotiate-your-salary-over-email-with-templates-5c14</guid>
      <description>&lt;p&gt;Salary negotiation is awkward enough in person. Over email, it's somehow worse — you're staring at a blank compose window, wondering if you sound greedy or desperate.&lt;/p&gt;

&lt;p&gt;I've been there. After helping dozens of people through job transitions, I noticed the same patterns: people who negotiate earn 10-20% more. People who don't leave money on the table every single time.&lt;/p&gt;

&lt;p&gt;Here's how to do it without the cringe.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Email Is Actually Better for Negotiation
&lt;/h2&gt;

&lt;p&gt;Most people think negotiation needs to happen on a call. It doesn't. Email gives you three advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time to think.&lt;/strong&gt; No pressure to respond in real-time. You can craft your words carefully.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A paper trail.&lt;/strong&gt; Everything is documented. No "I don't remember agreeing to that."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Less emotional.&lt;/strong&gt; You won't panic-accept a lowball offer because someone put you on the spot.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The downside? Tone is harder to convey. Which is why templates matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 4 Situations Where You'll Need These
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. You Got an Offer (But It's Low)
&lt;/h3&gt;

&lt;p&gt;This is the most common scenario. You're excited about the role, but the number doesn't match your research.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to say:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Thank you for the offer — I'm genuinely excited about joining the team. After reviewing the compensation package and comparing it with market data for similar roles in [city/remote], I was hoping we could discuss the base salary. Based on my experience with [specific skill] and the scope of this role, I believe a salary in the range of [target] would be more aligned. I'm flexible and open to discussing other components of the package as well.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; You're not rejecting anything. You're expressing enthusiasm AND making a specific ask backed by market data.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. You Want a Raise at Your Current Job
&lt;/h3&gt;

&lt;p&gt;Timing matters here. Don't ask right after a company-wide layoff. Do ask after you've shipped something significant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to say:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I wanted to schedule some time to discuss my compensation. Over the past [timeframe], I've [specific accomplishment — saved $X, shipped Y, grew Z by N%]. I've also taken on [additional responsibility]. Given these contributions and current market rates for my role, I'd like to discuss adjusting my salary to [target range]. When would be a good time to talk?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; You lead with value delivered, not "I need more money." You're making the business case first.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. You're Countering a Counter-Offer
&lt;/h3&gt;

&lt;p&gt;They came back with a number. It's better than the original but still not where you want to be.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to say:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I appreciate you revising the offer — it shows the team values this conversation. I'm still a bit apart on the base salary. Would [specific number] be possible? I'm also open to discussing [signing bonus / equity / extra PTO / remote flexibility] if the base has a hard cap. I want to make this work for both sides.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; You acknowledge their effort, stay specific, and offer creative alternatives.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. After a Rejection (Graceful Follow-Up)
&lt;/h3&gt;

&lt;p&gt;They said no to your ask. Don't burn the bridge.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Thank you for considering my request. I understand there are budget constraints. Could we revisit this in [3-6 months]? In the meantime, I'd love to understand what specific milestones or achievements would support a salary adjustment. That way I can work toward them intentionally.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  5 Rules That Make or Break Email Negotiations
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Never give your number first.&lt;/strong&gt; Let them anchor. If pressed, give a range where your minimum is your actual target.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use specific numbers.&lt;/strong&gt; $73,500 is more convincing than $75,000 because it signals you've done research, not just rounded up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't apologize for negotiating.&lt;/strong&gt; "Sorry to ask, but..." undermines everything after it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Always have a backup plan.&lt;/strong&gt; The strongest negotiating position is genuine willingness to walk away.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Follow up in 48 hours if no response.&lt;/strong&gt; One short email: "Just checking in on the compensation discussion — happy to jump on a call if easier."&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Tools That Actually Help
&lt;/h2&gt;

&lt;p&gt;I built a free &lt;a href="https://charliemorrison.dev/salary-negotiation" rel="noopener noreferrer"&gt;Salary Negotiation Script Generator&lt;/a&gt; that creates customized scripts for 4 scenarios (new offer, raise, promotion, counter-offer) in 4 different tones. No signup, runs in your browser, nothing gets stored.&lt;/p&gt;

&lt;p&gt;For the full picture — interview prep, resume optimization, cover letters, and salary negotiation all in one place — there's the &lt;a href="https://charliemorrison.lemonsqueezy.com/checkout/buy/0c3f51d6-9089-466e-ada4-58a0b22036e0" rel="noopener noreferrer"&gt;Job Search AI Toolkit&lt;/a&gt; ($12, 100+ prompts).&lt;/p&gt;

&lt;p&gt;Also check out the other free tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://charliemorrison.dev/resume-checker" rel="noopener noreferrer"&gt;Resume ATS Score Checker&lt;/a&gt; — see how your resume scores before applying&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://charliemorrison.dev/linkedin-headline" rel="noopener noreferrer"&gt;LinkedIn Headline Generator&lt;/a&gt; — 30 headline variations in seconds&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://charliemorrison.dev/cover-letter" rel="noopener noreferrer"&gt;Cover Letter Generator&lt;/a&gt; — customized cover letters by tone&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://charliemorrison.dev/follow-up-email" rel="noopener noreferrer"&gt;Interview Follow-Up Email Generator&lt;/a&gt; — never ghost a recruiter again&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Truth
&lt;/h2&gt;

&lt;p&gt;Most people don't negotiate because they're afraid of losing the offer. In 15 years of hiring data, fewer than 1% of companies have ever rescinded an offer because someone negotiated professionally.&lt;/p&gt;

&lt;p&gt;The bigger risk is accepting without asking. That initial salary becomes your baseline for every raise, bonus, and future offer for years.&lt;/p&gt;

&lt;p&gt;One email. Ten minutes. Potentially tens of thousands of dollars over your career.&lt;/p&gt;

&lt;p&gt;Worth the awkwardness.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your worst salary negotiation story? I've heard some doozies. Drop it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>salary</category>
      <category>productivity</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Tool-Calling Prompts That Don't Blow Up on the Five Edge Cases</title>
      <dc:creator>Gabriel Anhaia</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:53:30 +0000</pubDate>
      <link>https://forem.com/gabrielanhaia/tool-calling-prompts-that-dont-blow-up-on-the-five-edge-cases-53m0</link>
      <guid>https://forem.com/gabrielanhaia/tool-calling-prompts-that-dont-blow-up-on-the-five-edge-cases-53m0</guid>
      <description>&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Book:&lt;/strong&gt; &lt;a href="https://www.amazon.com/dp/B0GYJZ2XJD" rel="noopener noreferrer"&gt;AI Agents Pocket Guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Also by me:&lt;/strong&gt; &lt;a href="https://www.amazon.com/dp/B0GX38N645" rel="noopener noreferrer"&gt;Prompt Engineering Pocket Guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;My project:&lt;/strong&gt; &lt;a href="https://hermes-ide.com" rel="noopener noreferrer"&gt;Hermes IDE&lt;/a&gt; | &lt;a href="https://github.com/hermes-hq/hermes-ide" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; — an IDE for developers who ship with Claude Code and other AI coding tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Me:&lt;/strong&gt; &lt;a href="https://xgabriel.com" rel="noopener noreferrer"&gt;xgabriel.com&lt;/a&gt; | &lt;a href="https://github.com/gabrielanhaia" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;You've seen this shape: an agent confidently calls &lt;code&gt;delete_user&lt;/code&gt; instead of &lt;code&gt;archive_user&lt;/code&gt; because the description is off by one verb. Same arg signature, both &lt;code&gt;(user_id: str)&lt;/code&gt;, the model picks the first one in the schema list, and a paying customer gets nuked. The post-mortem eats half a day. The fix is a sentence in a tool description.&lt;/p&gt;

&lt;p&gt;That's edge case one of five. Tool-calling looks clean in the demo. You define a function, register it, the model calls it, you parse the JSON. Then you ship and the model finds the seams. It picks the wrong tool. It passes a string where you wanted an int. It calls &lt;code&gt;search_orders&lt;/code&gt; six times in a row with the same query because the response format confused it. It runs &lt;code&gt;send_email&lt;/code&gt; before &lt;code&gt;compose_email&lt;/code&gt; finished returning. It calls a tool when the answer was already in context.&lt;/p&gt;

&lt;p&gt;These five failure modes are common in production agents, and the prompt-plus-schema-plus-validator pattern handles them. Code is Python with the Anthropic SDK shape, but the patterns port to OpenAI, Vertex, and the &lt;a href="https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling" rel="noopener noreferrer"&gt;AI SDK&lt;/a&gt; one-to-one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Edge case 1: two valid tools, model picks neither (or the wrong one)
&lt;/h2&gt;

&lt;p&gt;Two tools both fit the user's request. The model either picks the wrong one with confidence, or freezes and answers without calling any tool at all. The &lt;a href="https://medium.com/@komalbaparmar007/llm-tool-calling-in-production-rate-limits-retries-and-the-infinite-loop-failure-mode-you-must-2a1e2a1e84c8" rel="noopener noreferrer"&gt;NESTFUL benchmark&lt;/a&gt; (as reported by Medium citing the NESTFUL paper) measured frontier models on nested API calls and found full-sequence accuracy was low. A chunk of those misses are tool-selection misses, and many of the rest are arg-generation misses.&lt;/p&gt;

&lt;p&gt;The fix lives in the description. Each tool description has to explicitly contrast itself against the closest sibling.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"archive_user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Soft-delete a user. Marks the row archived=true and hides the account from listings. REVERSIBLE within 30 days. Use this for cancellations, churn, GDPR soft requests. NOT for permanent deletion: use delete_user for that."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input_schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uuid"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"delete_user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PERMANENTLY remove a user row and PII. IRREVERSIBLE. Use only for legal-mandated hard deletes (GDPR Article 17 confirmed, court order). NOT for cancellations: use archive_user."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input_schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uuid"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"legal_basis"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"enum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"gdpr_article_17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"court_order"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"legal_basis"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two patterns there. First, every description names its sibling and the boundary between them. The model picks tools by reading descriptions; if your descriptions don't compare, the model guesses. Second, the dangerous tool requires an extra arg with a closed enum. A model that wants to call &lt;code&gt;delete_user&lt;/code&gt; has to commit to a legal basis from a fixed list. That's hard to do casually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Edge case 2: wrong-arg-type, silently coerced
&lt;/h2&gt;

&lt;p&gt;The model passes &lt;code&gt;"42"&lt;/code&gt; where you wanted &lt;code&gt;42&lt;/code&gt;. Or &lt;code&gt;"true"&lt;/code&gt; where you wanted &lt;code&gt;true&lt;/code&gt;. Or &lt;code&gt;"2026-04-27"&lt;/code&gt; where you wanted a full ISO datetime with timezone. JSON Schema validation catches the obvious ones. The non-obvious ones slide through. Strings that parse as numbers, dates without timezones, IDs that look right but reference the wrong namespace. They reach your downstream call and break something you didn't write.&lt;/p&gt;

&lt;p&gt;Wrap every tool handler in a validator. Don't trust the schema alone.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field_validator&lt;/span&gt;

&lt;span class="n"&gt;UUID_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[0-9a-f]{4}-[0-9a-f]{12}$&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ArchiveUserArgs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(...,&lt;/span&gt; &lt;span class="n"&gt;min_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@field_validator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nd"&gt;@classmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_uuid_shape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;UUID_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()):&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id must be a UUID, got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TOOL_SCHEMAS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;raw_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invalid_arguments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;detail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Re-read the tool schema and retry.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TOOL_HANDLERS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The validator does two jobs the schema can't. It returns a structured error the model can read on the next turn, and it gives you a single chokepoint to log every coercion attempt. When &lt;code&gt;user_id&lt;/code&gt; arrives as &lt;code&gt;"USER-1234"&lt;/code&gt; instead of a UUID, the model sees the error message, corrects, and retries, usually within one turn. Without the validator, the request hits your DB layer, throws somewhere ugly, and the agent turn ends in a 500.&lt;/p&gt;

&lt;p&gt;The hint matters. Models steer hard on the error string. "Re-read the tool schema and retry" produces better recoveries than a bare stack trace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Edge case 3: the infinite tool loop
&lt;/h2&gt;

&lt;p&gt;Agent calls &lt;code&gt;search_orders(query="acme")&lt;/code&gt;. Gets back a result that doesn't quite match what the user asked. Calls &lt;code&gt;search_orders(query="acme")&lt;/code&gt; again. Gets the same result. Calls it again. You wake up to a $2,400 bill and a Slack thread.&lt;/p&gt;

&lt;p&gt;Two layers of defense. The first is a hard step counter, the &lt;a href="https://apxml.com/courses/langchain-production-llm/chapter-2-sophisticated-agents-tools/agent-error-handling" rel="noopener noreferrer"&gt;classic mitigation&lt;/a&gt;. The second is a duplicate-call detector that fires before the counter ever trips.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sha256&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LoopGuard&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_steps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="n"&gt;max_repeat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_steps&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_repeat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_repeat&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;blob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;sort_keys&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()[:&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_steps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step_budget_exceeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_repeat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;duplicate_call_blocked&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the agent loop, call &lt;code&gt;guard.check(name, args)&lt;/code&gt; before dispatching. If it returns a reason, push that reason back into the conversation as a tool result with &lt;code&gt;is_error=true&lt;/code&gt; and break out. The model sees "you already ran this exact call twice; pick a different approach" and almost always pivots. If it doesn't, the step budget catches it.&lt;/p&gt;

&lt;p&gt;The system prompt has to know about this. Otherwise the model treats the block as a bug and complains.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You have a tool budget of 12 calls per turn. Identical
calls (same name, same args) are blocked after 2 attempts.
If a tool result is unhelpful, change the args or pick a
different tool. Don't retry the exact call.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Edge case 4: wrong tool ordering
&lt;/h2&gt;

&lt;p&gt;The agent has to compose an email before sending it. It sends first, the body is empty, the customer gets a blank message at 9:02 AM. Or it tries to charge a card before creating the customer record. Or it queries an analytics endpoint before the data warehouse refresh tool finished.&lt;/p&gt;

&lt;p&gt;Schemas don't express order. Prompts have to. Two patterns work.&lt;/p&gt;

&lt;p&gt;The first is a phase-gated system prompt. Group tools by phase and tell the model the phase rules.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You operate in three phases per request:

1. GATHER: call read-only tools (search_*, get_*, list_*).
   You may call any number, in any order.
2. PLAN: produce a short plan in plain text. No tools.
3. ACT: call write tools (create_*, send_*, charge_*,
   archive_*, delete_*) in the order from the plan.
   You may not call a write tool before producing the plan.

If a write tool fails, return to PLAN before retrying.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second is a precondition field on each write tool, enforced by the validator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SendEmailArgs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;draft_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;confirmed_recipient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

    &lt;span class="nd"&gt;@field_validator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;draft_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nd"&gt;@classmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_draft_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;draft_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;draft_id &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt; not found. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Call compose_email first.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model can't call &lt;code&gt;send_email&lt;/code&gt; until &lt;code&gt;compose_email&lt;/code&gt; has produced a real &lt;code&gt;draft_id&lt;/code&gt;. The error message names the missing prerequisite, so the next turn fixes itself. This is the same "structured error, model reads it, recovers" pattern from edge case 2, applied to ordering instead of types.&lt;/p&gt;

&lt;h2&gt;
  
  
  Edge case 5: the tool that should be skipped entirely
&lt;/h2&gt;

&lt;p&gt;You ask the agent "what's our refund policy?" The agent calls &lt;code&gt;search_knowledge_base("refund policy")&lt;/code&gt;, retrieves three docs, summarizes them. Fine. You ask it "what's 2 + 2." The agent calls &lt;code&gt;search_knowledge_base("2 + 2")&lt;/code&gt;, retrieves nothing relevant, and now you've spent 800ms and three vector lookups on arithmetic. Multiply by a million users.&lt;/p&gt;

&lt;p&gt;The fix is a "no tool needed" escape hatch in the system prompt, plus a tool-selection rubric.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before calling any tool, ask yourself:

- Is the answer already in the conversation?
- Is this a general-knowledge question the model can
  answer without a tool?
- Would a human assistant reach for a tool here, or
  just answer?

If any of the above is yes, answer directly. Tools cost
latency and money. Default to no tool.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pair that with a tool description that names the trigger condition explicitly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"search_knowledge_base"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Search internal company docs (policies, runbooks, product specs). USE WHEN the user asks about company-specific information not in your training data. DO NOT USE for general knowledge, math, code questions, or follow-ups whose answer is already in the conversation."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input_schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"USE WHEN" and "DO NOT USE" are two phrases that tend to move tool-call rate the most in production evals. The model is reading your description as a rubric. Give it one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The prompt skeleton, end to end
&lt;/h2&gt;

&lt;p&gt;The pieces fit together like this. System prompt sets phases and the no-tool rubric. Tool descriptions contrast siblings and name USE WHEN / DO NOT USE. Schemas use enums and formats wherever a closed set exists. Every handler runs through a Pydantic validator that returns structured errors. A &lt;code&gt;LoopGuard&lt;/code&gt; wraps the dispatch loop. The model sees errors as readable text on the next turn and self-corrects.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agent_turn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoopGuard&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="n"&gt;reason&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_use_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;is_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;}]}&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That loop is maybe forty lines once you flesh out the imports. It survives all five edge cases above without any LangChain, LangGraph, or framework on top. Teams shipping production agents tend to reinvent pieces of this anyway; writing it once and treating it as the contract is faster than learning a framework's escape hatches.&lt;/p&gt;

&lt;p&gt;The pattern that ties everything together is small: tools fail, the model reads the failure, the model retries. Your job is to make every failure a sentence the model can act on. A bare 500 isn't actionable. &lt;code&gt;"draft_id 'xyz' not found. Call compose_email first."&lt;/code&gt; is. The whole rig is just plumbing around that one idea.&lt;/p&gt;

&lt;h2&gt;
  
  
  If this was useful
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://www.amazon.com/dp/B0GYJZ2XJD" rel="noopener noreferrer"&gt;AI Agents Pocket Guide&lt;/a&gt; goes deeper on the agent-loop design: tool budgets, retry policy, phase prompts, and the eval rig that catches edge cases before customers do. The &lt;a href="https://www.amazon.com/dp/B0GX38N645" rel="noopener noreferrer"&gt;Prompt Engineering Pocket Guide&lt;/a&gt; covers the description-writing patterns from edge cases 1 and 5 in their own chapter, with before/after diffs and the eval scores that go with them. Pair them if you're shipping anything that calls more than one tool per turn.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.amazon.com/dp/B0GYJZ2XJD" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffzkyb8bqzkmlt4qzgqbo.jpg" alt="AI Agents Pocket Guide" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.amazon.com/dp/B0GX38N645" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg0d5pom6bpbranr5abrn.jpg" alt="Prompt Engineering Pocket Guide" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>prompt</category>
      <category>python</category>
    </item>
    <item>
      <title>Migrating From a JSON Column to a Proper Schema in Postgres</title>
      <dc:creator>Gabriel Anhaia</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:53:01 +0000</pubDate>
      <link>https://forem.com/gabrielanhaia/migrating-from-a-json-column-to-a-proper-schema-in-postgres-4o9e</link>
      <guid>https://forem.com/gabrielanhaia/migrating-from-a-json-column-to-a-proper-schema-in-postgres-4o9e</guid>
      <description>&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Book:&lt;/strong&gt; &lt;a href="https://www.amazon.com/dp/B0GYLMVX9S" rel="noopener noreferrer"&gt;Database Playbook: Choosing the Right Store for Every System You Build&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;My project:&lt;/strong&gt; &lt;a href="https://hermes-ide.com" rel="noopener noreferrer"&gt;Hermes IDE&lt;/a&gt; | &lt;a href="https://github.com/hermes-hq/hermes-ide" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; — an IDE for developers who ship with Claude Code and other AI coding tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Me:&lt;/strong&gt; &lt;a href="https://xgabriel.com" rel="noopener noreferrer"&gt;xgabriel.com&lt;/a&gt; | &lt;a href="https://github.com/gabrielanhaia" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;A &lt;code&gt;WHERE payload-&amp;gt;&amp;gt;'status' = 'paid'&lt;/code&gt; query has been doing 800ms sequential scans for two months because there is no index on it, and the team that owns the table can't add one cleanly. They tried. The expression index built fine in staging and timed out behind a long-running transaction in production. The &lt;code&gt;status&lt;/code&gt; field is buried inside a &lt;code&gt;jsonb&lt;/code&gt; blob with eleven other keys, three of which are typed wrong, two of which nobody on the team remembers writing.&lt;/p&gt;

&lt;p&gt;This is the migration. Out of the JSON column, into typed columns, with zero downtime and no broken reads. The pattern is called expand-and-contract: you add the new shape next to the old one, move traffic across in stages, and only drop the old shape once the new one has been serving production for long enough to trust.&lt;/p&gt;

&lt;p&gt;The shape we're starting from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt;           &lt;span class="n"&gt;bigserial&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;  &lt;span class="nb"&gt;bigint&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;payload&lt;/span&gt;      &lt;span class="n"&gt;jsonb&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt;   &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;payload&lt;/code&gt; holds &lt;code&gt;{ "status": "paid", "total_cents": 4200, "currency": "USD", ... }&lt;/code&gt;. We want &lt;code&gt;status&lt;/code&gt;, &lt;code&gt;total_cents&lt;/code&gt;, and &lt;code&gt;currency&lt;/code&gt; as first-class columns with proper types, constraints, and indexes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Add target columns and backfill
&lt;/h2&gt;

&lt;p&gt;The first instinct is &lt;code&gt;ALTER TABLE orders ADD COLUMN status text NOT NULL&lt;/code&gt;. On a busy table, that rewrites every row and holds an &lt;code&gt;ACCESS EXCLUSIVE&lt;/code&gt; lock for the duration. On a 200M-row table, that's an outage.&lt;/p&gt;

&lt;p&gt;The Postgres-friendly version is to add the column nullable, with no default, then backfill in batches.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;       &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;total_cents&lt;/span&gt;  &lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;     &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three metadata-only operations. They take milliseconds regardless of table size because Postgres 11+ stores nullable column additions without a rewrite.&lt;/p&gt;

&lt;p&gt;Now backfill in batches. Doing this in one statement locks the world; doing it in chunks keeps replication lag bounded and lets you pause if something starts misbehaving.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Run repeatedly until 0 rows update.&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
  &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
  &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;
  &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;SKIP&lt;/span&gt; &lt;span class="n"&gt;LOCKED&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'status'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;total_cents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'total_cents'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;currency&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'currency'&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;SKIP LOCKED&lt;/code&gt; keeps this loop from fighting application writes. Run it from a worker, sleep 100ms between batches, watch &lt;code&gt;pg_stat_replication.replay_lag&lt;/code&gt; if you have replicas. A 50M-row table at 5k rows per batch and 100ms sleep finishes in roughly 17 minutes.&lt;/p&gt;

&lt;p&gt;A note on bad data. The cast &lt;code&gt;(payload-&amp;gt;&amp;gt;'total_cents')::bigint&lt;/code&gt; will explode on the row where someone wrote &lt;code&gt;"total_cents": "n/a"&lt;/code&gt; in 2022. Find those first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'total_cents'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;bad_value&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'total_cents'&lt;/span&gt; &lt;span class="o"&gt;!~&lt;/span&gt; &lt;span class="s1"&gt;'^[0-9]+$'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'total_cents'&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Decide per row: fix, default, or quarantine. Don't let the backfill loop discover them at 3 a.m.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Dual-write from the application
&lt;/h2&gt;

&lt;p&gt;Backfill catches existing rows. New writes still go to &lt;code&gt;payload&lt;/code&gt; only. Until the application writes to both shapes, the new columns drift the moment you stop the backfill.&lt;/p&gt;

&lt;p&gt;In your write path, populate both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_cents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;currency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        INSERT INTO orders (
          customer_id, payload, status, total_cents, currency
        )
        VALUES (%s, %s, %s, %s, %s)
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_cents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Updates need the same treatment. Every code path that writes &lt;code&gt;payload&lt;/code&gt; must also write the typed columns. Grep for &lt;code&gt;payload&lt;/code&gt; in your codebase before you ship this; the one update path you forget is the one that breaks parity.&lt;/p&gt;

&lt;p&gt;If you can't trust the application to dual-write reliably, fall back to a trigger:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;sync_orders_columns&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="k"&gt;trigger&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
  &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;      &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'status'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_cents&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'total_cents'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;currency&lt;/span&gt;    &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'currency'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="n"&gt;orders_sync&lt;/span&gt;
&lt;span class="k"&gt;BEFORE&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;EACH&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;EXECUTE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;sync_orders_columns&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trigger is the safety net, not the goal. Triggers add per-row overhead and hide the migration from anyone reading the application code six months later. Use it during the migration, drop it once dual-write is proven.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Verify parity before flipping reads
&lt;/h2&gt;

&lt;p&gt;Don't trust the migration just because the backfill finished. Compare the two shapes for every row:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;mismatches&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt;      &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'status'&lt;/span&gt;
  &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;total_cents&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt;
       &lt;span class="k"&gt;NULLIF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'total_cents'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;bigint&lt;/span&gt;
  &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="n"&gt;currency&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'currency'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;IS DISTINCT FROM&lt;/code&gt; treats &lt;code&gt;NULL&lt;/code&gt; as comparable, which is what you want; &lt;code&gt;status = NULL&lt;/code&gt; is &lt;code&gt;NULL&lt;/code&gt;, not &lt;code&gt;false&lt;/code&gt;. Any non-zero count is a row to investigate before you keep going.&lt;/p&gt;

&lt;p&gt;Run this on a replica if you can. The query is a sequential scan and you don't want it competing with production writes for buffer cache.&lt;/p&gt;

&lt;p&gt;The other check that matters: a sample of recent inserts to confirm the dual-write path is wired up. If the trigger is live this is redundant; if you removed the trigger and trust the application, it isn't.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'status'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;pstatus&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="s1"&gt;'1 hour'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="s1"&gt;'status'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Zero rows means dual-write is healthy. One row means somebody opened a code path you didn't grep.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Add indexes and constraints, then flip reads
&lt;/h2&gt;

&lt;p&gt;Now the typed columns can carry indexes and constraints. Build them concurrently. &lt;code&gt;CREATE INDEX&lt;/code&gt; without &lt;code&gt;CONCURRENTLY&lt;/code&gt; takes an &lt;code&gt;ACCESS EXCLUSIVE&lt;/code&gt; lock for the duration and blocks every write to the table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;CONCURRENTLY&lt;/span&gt; &lt;span class="n"&gt;idx_orders_status&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;CONCURRENTLY&lt;/span&gt; &lt;span class="n"&gt;idx_orders_status_created&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;CONCURRENTLY&lt;/code&gt; cannot run inside a transaction, takes longer (two table passes), and can leave an &lt;code&gt;INVALID&lt;/code&gt; index behind if it fails. Check after every build:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;indexrelid&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indisvalid&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_index&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;indrelid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'orders'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;regclass&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;indisvalid&lt;/code&gt; is &lt;code&gt;false&lt;/code&gt;, drop it and rebuild. Don't try to repair it.&lt;/p&gt;

&lt;p&gt;The not-null constraint comes next. The naive form rewrites the table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Avoid this on large tables.&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
  &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Postgres 12+ has a faster route. Add a &lt;code&gt;CHECK&lt;/code&gt; constraint as &lt;code&gt;NOT VALID&lt;/code&gt; first, validate it without the heavy lock, then promote.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
  &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;orders_status_not_null&lt;/span&gt;
  &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;VALID&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
  &lt;span class="n"&gt;VALIDATE&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;orders_status_not_null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;NOT VALID&lt;/code&gt; skips the existing-row check and takes a short lock. &lt;code&gt;VALIDATE&lt;/code&gt; reads every row but only takes a &lt;code&gt;SHARE UPDATE EXCLUSIVE&lt;/code&gt; lock, so concurrent reads and writes keep flowing. Once it passes, you can promote to a real &lt;code&gt;NOT NULL&lt;/code&gt; (still rewrites in older versions; on Postgres 12+ it uses the validated check and finishes instantly):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;orders_status_not_null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With indexes and constraints in place, flip the read path. In the application, change every &lt;code&gt;WHERE payload-&amp;gt;&amp;gt;'status' = 'paid'&lt;/code&gt; to &lt;code&gt;WHERE status = 'paid'&lt;/code&gt;. Ship it behind a feature flag if you can; roll back is now a config change rather than a migration.&lt;/p&gt;

&lt;p&gt;The 800ms scan that started this post becomes a 2ms index lookup the moment the new query path goes live.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Drop the JSON column
&lt;/h2&gt;

&lt;p&gt;Wait. Not next sprint, not next deploy. Wait long enough that you trust the new path under real load. A week of clean parity checks and zero rollbacks is a reasonable bar. Two weeks is better. The cost of waiting is a column on disk; the cost of dropping early is a missing field your billing job needed and nobody noticed in staging.&lt;/p&gt;

&lt;p&gt;When you're ready, remove the dual-write code first. Then the trigger if you used one. Then the column.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;orders_sync&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;sync_orders_columns&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;DROP COLUMN&lt;/code&gt; is metadata-only in Postgres. It marks the column dead and leaves the data on disk until the next &lt;code&gt;VACUUM FULL&lt;/code&gt; or table rewrite. Fast, safe, reversible-ish (you can &lt;code&gt;ALTER TABLE ... ADD COLUMN&lt;/code&gt; back, but the data is gone).&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;payload&lt;/code&gt; had keys you didn't migrate, archive them first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders_payload_archive&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;archived_at&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cheap insurance. Drop the archive a quarter later when nobody has asked for it.&lt;/p&gt;

&lt;p&gt;The hard part isn't the SQL. It's the discipline to leave the old column alone for two weeks while the new one earns its place.&lt;/p&gt;




&lt;h2&gt;
  
  
  If this was useful
&lt;/h2&gt;

&lt;p&gt;Chapter 6 of the &lt;a href="https://www.amazon.com/dp/B0GYLMVX9S" rel="noopener noreferrer"&gt;Database Playbook&lt;/a&gt; walks through schema-evolution patterns for Postgres, MySQL, and the document stores: when to model relationally, when &lt;code&gt;jsonb&lt;/code&gt; actually pays off, and how to migrate between the two without taking the table offline. If your team is staring at a &lt;code&gt;jsonb&lt;/code&gt; column that grew teeth, it's the chapter for you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.amazon.com/dp/B0GYLMVX9S" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fktttopxmkwrt9qcazhnc.jpg" alt="Database Playbook: Choosing the Right Store for Every System You Build" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>tutorial</category>
      <category>devops</category>
    </item>
    <item>
      <title>Machine Learning Driven Crop Yield Prediction with NLP-Based Insight</title>
      <dc:creator>CHITTIPROLU DAKSHAYANI</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:52:51 +0000</pubDate>
      <link>https://forem.com/chittiprolu_dakshayani/machine-learning-driven-crop-yield-prediction-with-nlp-based-insight-5fmg</link>
      <guid>https://forem.com/chittiprolu_dakshayani/machine-learning-driven-crop-yield-prediction-with-nlp-based-insight-5fmg</guid>
      <description>&lt;p&gt;Machine Learning Driven Crop Yield Prediction with NLP-Based Insight is a smart agriculture system. It helps farmers to make better decisions. This project uses ML to guess how much crop can be cultivated. It also uses NLP to understand the farmer's inputs, weather reports, and market trends.&lt;br&gt;
This project uses current and previous data to make predictions. This helps farmers choose the right crop and use resources such as water, soil and fertilisers.&lt;br&gt;
Overall, this project shows how technology makes farming easier and smarter by using ML and NLP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TEAM MEMBERS&lt;/strong&gt;&lt;br&gt;
This project was developed by:&lt;/p&gt;

&lt;p&gt;&lt;a class="mentioned-user" href="https://dev.to/chityala_akshitha"&gt;@chityala_akshitha&lt;/a&gt; - Chityala Akshitha&lt;br&gt;
&lt;a class="mentioned-user" href="https://dev.to/kapa_keerthi_reddy"&gt;@kapa_keerthi_reddy&lt;/a&gt; - Kapa Keerthi Reddy&lt;br&gt;
&lt;a class="mentioned-user" href="https://dev.to/k_sahasri_4178e21fcb9e123"&gt;@k_sahasri_4178e21fcb9e123&lt;/a&gt; - Kannayavandla Sahasri&lt;/p&gt;

&lt;p&gt;We want to express our sincere gratitude to &lt;a class="mentioned-user" href="https://dev.to/chanda_rajkumar"&gt;@chanda_rajkumar&lt;/a&gt; for their guidance and support throughout this project. He helped in understanding the project, development and architecture of &lt;strong&gt;Machine Learning Driven Crop Yield Prediction with NLP-Based Insight&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem We Set Out to Solve&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agriculture today is not as straightforward as it used to be. Farmers have to deal with unpredictable weather, changing soil conditions, and constantly shifting market trends. Even though a lot of useful data exists—like weather reports, expert recommendations, and shared farming experiences—it often isn’t easy to combine all of this into something practical. That’s where our project comes in.&lt;br&gt;
We've built a system that uses machine learning to predict crop yield based on simple inputs such as soil type, weather conditions, and crop details. The idea was to create something that doesn’t feel complicated but still gives meaningful output that can actually help in decision-making.&lt;br&gt;
But we didn’t want to stop at just prediction. A lot of valuable information in agriculture comes in text form, such as reports, suggestions, and guidelines. So we integrate Natural Language Processing (NLP) to process this kind of data and turn it into clear, easy-to-understand insights instead of leaving it as raw text.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8joobn3vzc7n751cmwdx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8joobn3vzc7n751cmwdx.png" alt=" " width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What we found interesting while working on this project is how powerful it becomes when you combine structured data (like numbers) with unstructured data (like text). Instead of just giving a predicted value, the system tries to provide context and useful suggestions around it.&lt;br&gt;
Overall, this project is about making data actually useful. The goal is simple: to help farmers make better decisions with the information they already have, but in a way that’s easier to understand and apply in real life.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Approach&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The machine learning-driven crop yield prediction with natural language processing is a system that is used to guide farmers to make the right decisions, and it will make the farmers' decisions easier and smarter. The ML model will not make decisions from past data; it will use some real-time user queries, such as asking the soil type, weather conditions and what type of crop they are using. At the same time, NLP will process the text and understand data such as agricultural reports, farmer feedback, and market information to provide some useful insights. This ML model will develop farming from a risky and uncertain process to a more planned and data-driven process where farmers can make better decisions in advance, and they can improve their overall productivity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System Architecture Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This System takes the information from farmers, including soil conditions, weather information and also the crop details, through the MongoDB backend. MongoDB takes this input and connects it with machine learning models to predict our model and detect possible risks.&lt;br&gt;
At the same time, the model predicts information like farmer feedback or agriculture information, which is analysed using NLP to understand the common drawbacks and provide useful information.&lt;br&gt;
All this information, including the inputs, predictions, and insights, is stored in MongoDB for easy predictions and real-time updates.&lt;br&gt;
This model creates a smooth and easy way for data collection to prediction and gives suggestions, helping farmers get clear and useful guidance for better farming ideas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Model in MongoDB&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;users – stores user profiles and preferences&lt;br&gt;
_id, name, email, phone, preferred_language, location, role (farmer/admin), registration_date&lt;br&gt;
feedback – user-submitted feedback and reports&lt;br&gt;
_id, user_id, crop, region, soil, weather, notes, sentiment, date_submitted&lt;br&gt;
crop_prices – stores historical and edited crop prices&lt;br&gt;
_id, crop_name, price, unit, market, date_updated, updated_by&lt;br&gt;
crop_predictions – machine learning crop recommendations&lt;br&gt;
_id, user_id, soil, weather, region, recommended_crop, advice, prediction_date&lt;br&gt;
disease_diagnosis – plant image uploads and disease results&lt;br&gt;
_id, user_id, image_path, diagnosis, confidence, advice, upload_date&lt;br&gt;
price_predictions – AI-predicted crop prices&lt;br&gt;
_id, crop_name, market, predicted_price, prediction_date, model_version&lt;br&gt;
dashboard_stats – aggregated analytics for dashboard&lt;br&gt;
_id, date, crop_counts, issue_counts, feedback_summary&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm8rw6ypqwbe50zs40o1u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm8rw6ypqwbe50zs40o1u.png" alt=" " width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;`from pymongo import MongoClient&lt;/p&gt;

&lt;p&gt;app = Flask(&lt;strong&gt;name&lt;/strong&gt;)&lt;br&gt;
client = MongoClient(&lt;br&gt;
"mongodb+srv://dakshayani:&lt;a href="mailto:dakshi19@myatlasclusteredu.wizq9sn.mongodb.net"&gt;dakshi19@myatlasclusteredu.wizq9sn.mongodb.net&lt;/a&gt;/myDB?retryWrites=true&amp;amp;w=majority",&lt;br&gt;
    tls=True,&lt;br&gt;
    tlsCAFile=certifi.where()&lt;br&gt;
)&lt;br&gt;
db = client["myDB"]&lt;br&gt;
collection = db["feedback"]&lt;br&gt;
app.secret_key = 'replace-this-with-a-secure-key'&lt;br&gt;
DATA_FILE = 'feedback.csv'&lt;br&gt;
UPLOAD_FOLDER = 'uploads'&lt;br&gt;
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER&lt;br&gt;
translator = Translator()`&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Aggregation Pipelines&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Crop yield trend helps to process historical yield data to show the changes in crop production over time. Weather impact analysis analyses weather data like rainfall and temperature to understand how they are affecting crop yield. Soil performance tracking helps to research how different soil types and conditions affect productivity and crop growth. Farmer activity tracing helps to measure how often farmers input data and interact with the system. prediction distribution helps to see how many predictions fall into low, medium or high yield categories. NLP insights trends recognise common issues like drought, pests or diseases from text data and feedback. The alert summary pipeline helps to highlight recent high-risk situations, such as poor yield prediction or negative farming conditions that need attention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration of AI &amp;amp; ML in the System&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This system employs simple machine learning and NLP algorithms to provide valuable insights related to farming, without increasing complexity as the users enters relevant information such as soil characteristics, weather parameters, and also crop characteristics, and the system evaluates the data and also makes predictions about the crop yield and the possible risks.&lt;br&gt;
Additionally, if the user provides feedback or problems, NLP analyses the information provided and identifies common factors such as pest infestation, drought, or low crop yields.&lt;br&gt;
For this, the platform shows several components. The first component (TextInsightEngine), which evaluates the text information provided by the users and also receives useful information from it and the next component involves an ML algorithm that also predicts the crop yield and categorises it as low, medium and high. Depending on these pieces of information, the system provides valuable suggestions for farmers.&lt;br&gt;
Additionally, there is an option for integrating a chatbot module that will deliver more conversational answers and also a fallback algorithm to guarantee proper functioning of the platform in case the other modules fail.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LabelEncoder&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.tree&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DecisionTreeClassifier&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dump&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;feedback.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;soil&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;crop&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;le_soil&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LabelEncoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;le_weather&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LabelEncoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;le_region&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LabelEncoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;le_crop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LabelEncoder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;soil_enc&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;le_soil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;soil&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;weather_enc&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;le_weather&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;region_enc&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;le_region&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;X_enc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;soil_enc&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;weather_enc&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;region_enc&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="n"&gt;y_enc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;le_crop&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DecisionTreeClassifier&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_enc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_enc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;crop_model.joblib&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;le_soil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;le_soil.joblib&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;le_weather&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;le_weather.joblib&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;le_region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;le_region.joblib&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;le_crop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;le_crop.joblib&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Model and encoders saved.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Experiences of Farmers First&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here, the present project is created based on the notion that the farmers' decisions are not made manually. In addition to that, farmers do not have to rely on their own experience. We have to provide them with the information through a user-friendly platform, so that it allows them to make informed decisions.&lt;br&gt;
The farmers get access to express something in clear information, such as a forecast of yield, risk factors, and also some insights gained due to the text mining analysis, which is performed on data based on weather and soil conditions. And at the same time, there is no need to understand complicated reports because they can see what kind of problems exist in the system is managed to detect, such as pests or lack of rainfall.&lt;br&gt;
Based on such insights, users will be able to receive such type of recommendations, such as an approach that will be able to create a consistent cycle for all the farmers during which they will always get to know what is their current situation and what kind of actions they need to take.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3iktw523ddwd59g30bl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3iktw523ddwd59g30bl.png" alt=" " width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenges We Faced and How We Solved Them&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In doing this project, we have faced so many challenges, especially while working with real-world agricultural data. By collecting the incomplete data from different places like weather, soil, we face so many issues, so to solve this, we have applied a few simple steps, such as clean, validate for better data.&lt;br&gt;
The accuracy of crop yield prediction was handled by a learning model that can easily learn from the data we have.&lt;br&gt;
To process the data, which varies in language and format of the farmers' feedback and report, we use simple NLP models which can identify key words and also can give the common problems they face, such as pest or low rainfall.&lt;br&gt;
The performance increases as we are using MongoDB's indexing and filtering which make queries simple. Here, there will be less problems for changing the data in database.&lt;br&gt;
We also use the backend as Django which is having a structure and which makes the system to work easily and update easily.&lt;br&gt;
By using all these solutions we have built a system that is reliable, flexible, and easy to use for farmers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Meet AgroAssist: Your Smart Farming Companion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AgroAssist is a chatbot in our project that assist farmers.  It gives guidance and support related to farming. It helps farmers to understand the crop yield, and weather conditions in simple way. This uses AI methods to provide answers. It uses ML for predictions and NLP to understand the farmer queries. It gives simple and clear replies so that farmers can understand it clearly.&lt;br&gt;
It is connected within our project to give answers in real-time and based on previous data. Overall, this helps farmers for farming. It makes farming easier and smarter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# AgroAssist Chatbot route and logic (no blueprint, no circular import)
&lt;/span&gt;&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/agroassist&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agroassist&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;chatlog&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;chatlog&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;chatlog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;chatlog&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;form&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user_input&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;chatlog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_agroassist_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;chatlog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;chatlog&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chatlog&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;redirect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;url_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;agroassist&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;render_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;agroassist.html&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chatlog&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chatlog&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_agroassist_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# Crop price Q&amp;amp;A
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cost&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;prices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_crop_prices&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;crop&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prices&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;crop&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Current price of &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;crop&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: ₹&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;crop&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;crop&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unit&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;crop&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;market&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please specify the crop name to get the price.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="c1"&gt;# Crop recommendation
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;recommend&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;which crop&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;To get crop recommendations, use the Predict page and enter your soil, weather, and region details.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="c1"&gt;# Disease/plant advice
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;disease&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;problem&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;plant&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;For plant disease diagnosis, use the Crop Doctor page and upload a plant image.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="c1"&gt;# Greetings
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hello&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hi&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hey&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello! I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m AgroAssist, your farming assistant. Ask me about crop prices, advice, or feedback.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="c1"&gt;# Fallback
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m AgroAssist. I can help with crop prices, recommendations, and farming advice. Try asking about a crop price or how to get recommendations.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What’s Next for Our Project&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our project can be improved by creating new AI models for accurate prediction. In the future, we can use good ML models and make better predictions in deeper. We can can also add the real-time notifications and alerts. This helps farmer to know the weather change, crop damage, and market trends at correct time. This helps in improving the prediction and provide better suggestions.&lt;br&gt;
We also update this app where we can access it on mobiles easily. Add the supporting sensors and devices which helps in collecting real-time data about soil, temperature and weather.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrapping Up&lt;/strong&gt;&lt;br&gt;
 This project is not just a crop prediction system. It is like smart farming platform which helps farmer to take better decisions. It uses ML for prediction and NLP for understand farmer queries.&lt;br&gt;
The system uses both present nad previous data to give suggestions. It helps farmers to choose right crops, and tell farmers how to use the resources properly. This improves the productivity.&lt;br&gt;
So, total project shows the modern farming technology which is smarter and easier to use.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>nlp</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Design Twitter Without the Fanout-on-Write Cliché</title>
      <dc:creator>Gabriel Anhaia</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:52:35 +0000</pubDate>
      <link>https://forem.com/gabrielanhaia/design-twitter-without-the-fanout-on-write-cliche-33g1</link>
      <guid>https://forem.com/gabrielanhaia/design-twitter-without-the-fanout-on-write-cliche-33g1</guid>
      <description>&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Book:&lt;/strong&gt; &lt;a href="https://www.amazon.com/dp/B0GX2SQ594" rel="noopener noreferrer"&gt;System Design Pocket Guide: Interviews&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;My project:&lt;/strong&gt; &lt;a href="https://hermes-ide.com" rel="noopener noreferrer"&gt;Hermes IDE&lt;/a&gt; | &lt;a href="https://github.com/hermes-hq/hermes-ide" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; — an IDE for developers who ship with Claude Code and other AI coding tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Me:&lt;/strong&gt; &lt;a href="https://xgabriel.com" rel="noopener noreferrer"&gt;xgabriel.com&lt;/a&gt; | &lt;a href="https://github.com/gabrielanhaia" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;The panel says: design Twitter. You have 45 minutes. Half the candidates draw the same diagram. They write to a fanout service, push tweet IDs into Redis lists keyed by &lt;code&gt;feed:&amp;lt;user_id&amp;gt;&lt;/code&gt;, read the home timeline by &lt;code&gt;LRANGE&lt;/code&gt;. Done in 12 minutes. The interviewer nods politely, then asks the question that ends the loop: "what happens when one of those users has 200 million followers?"&lt;/p&gt;

&lt;p&gt;You watch the candidate try to bolt on an answer. Cache the celebrity timeline. Add a queue. Shard harder. None of it lands, because the hot key was baked in at minute 4.&lt;/p&gt;

&lt;p&gt;Don't be them. The cliché answer is fine at small scale. The interview is asking whether you can name the moment it breaks and switch strategies. This is the version of "design Twitter" that surfaces the trade-off the panel is actually grading.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cliché baseline, fast
&lt;/h2&gt;

&lt;p&gt;Just enough to move past it. Fanout-on-write means: when user A tweets, the write path looks up A's followers and pushes the tweet ID into each follower's home timeline cache. Reads are cheap, one Redis call per user.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;post_tweet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tweet_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tweets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;followers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;social&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;followers_of&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fid&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;followers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lpush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feed:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ltrim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feed:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;799&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tweet_id&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_home_timeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feed:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tweets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why teams pick it. The home timeline is read 100x more than it's written for an average user, so paying once at write time is a sensible trade. &lt;code&gt;LRANGE&lt;/code&gt; is one round trip. Tweet hydration is a multi-get against the tweet store.&lt;/p&gt;

&lt;p&gt;Where it breaks. The for-loop. When &lt;code&gt;followers_of(author_id)&lt;/code&gt; returns 50 follower IDs you finish the write in milliseconds. When it returns 50 million IDs, you have a write amplification problem that no amount of Redis can absorb. The kind of trade-off the panel wants you to surface (not the literal numbers) is that fanout cost grows linearly with follower count, and a small number of accounts have follower counts that are orders of magnitude above the median.&lt;/p&gt;

&lt;p&gt;That asymmetry is the whole interview.&lt;/p&gt;

&lt;h2&gt;
  
  
  The celebrity problem, named clearly
&lt;/h2&gt;

&lt;p&gt;Call this the hot-key problem. One author, one write, millions of downstream insertions. Three things go wrong at once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write latency explodes.&lt;/strong&gt; A single tweet from a top-1000 account stalls the fanout queue. Other authors' tweets queue behind it. Latency p99 blows up across the system, not just for the celebrity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory is wasted.&lt;/strong&gt; Most followers of a celebrity will not open the app in the next hour. You wrote the tweet ID into 50 million Redis lists; maybe 2 million are read before the tweet falls off the timeline. The other 48 million writes were free RAM cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The hot key follows you.&lt;/strong&gt; If you try to cache "celebrity tweets" centrally instead, every read from every follower hits the same cache key. One node, all the traffic. Redis Cluster won't save you because the slot is the slot.&lt;/p&gt;

&lt;p&gt;Switch strategies for that author here. Scaling the existing one harder doesn't help.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fanout-on-read for the long tail of celebrities
&lt;/h2&gt;

&lt;p&gt;For high-follower accounts, flip the model. Don't push their tweets to followers at write time. Store celebrity tweets in a separate per-author timeline, and merge them in at read time when a follower asks for their home feed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;CELEBRITY_FOLLOWER_THRESHOLD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;post_tweet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tweet_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tweets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tweet&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;is_celebrity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# No fanout. Store on author timeline only.
&lt;/span&gt;        &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lpush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ltrim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;799&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fid&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;social&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;followers_of&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lpush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feed:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ltrim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feed:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;799&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tweet_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The read path now has work to do. It pulls the precomputed home timeline (cheap), pulls the recent tweets from each celebrity the user follows (a handful of small reads), merges them in chronological order, and trims.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_home_timeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;base_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feed:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;199&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;celeb_authors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;social&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;celebs_followed_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;celeb_ids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;author_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;celeb_authors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;celeb_ids&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lrange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;49&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;merged&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;merge_by_timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_ids&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;celeb_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tweets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the move the panel is waiting for. You traded a write-time cost (linear in followers) for a read-time cost (linear in celebrities-followed). The numbers behind that trade are favorable: the median user follows a small number of high-follower accounts. Even if you follow 200 celebrities, that's 200 small Redis reads per timeline load. Pipelined, that's one round trip.&lt;/p&gt;

&lt;h2&gt;
  
  
  The threshold, and how to detect it
&lt;/h2&gt;

&lt;p&gt;The question every interviewer asks next: where do you put the line? Pick a number that's easy to defend, then explain how you would tune it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_celebrity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;follower_count_cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;social&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;follower_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;follower_count_cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;CELEBRITY_FOLLOWER_THRESHOLD&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A static threshold (e.g. one million followers) is the answer to give first. It's auditable, easy to reason about, and you can ship it. Then layer the nuance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Posting rate matters.&lt;/strong&gt; A million-follower account that tweets once a month is not the same load profile as a hundred-thousand-follower account that tweets every 90 seconds during a livestream. A better signal is &lt;code&gt;followers * recent_post_rate&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Promotion has to be smooth.&lt;/strong&gt; When an account crosses the threshold, you don't want to atomically flip them. Existing followers have stale entries on their fanout timeline; new tweets need to come in through the read path. Run a backfill window: for the next N hours, the account's writes go to both paths. Followers' merge code dedupes by tweet ID. Then drop the fanout side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demotion is the underrated direction.&lt;/strong&gt; Accounts lose followers, go inactive, get suspended. A celebrity that no longer tweets shouldn't keep paying the read-side merge cost forever. Periodically re-evaluate, and demote stale celebrities back into normal fanout.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reclassify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;social&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;follower_count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;activity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;posts_per_day&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window_days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;5_000_000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;celebrity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;500_000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;borderline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;normal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;borderline&lt;/code&gt; is a useful third bucket. Borderline accounts fanout to active followers only, skipping followers who haven't opened the app in 30 days. That's the per-follower budget knob.&lt;/p&gt;

&lt;h2&gt;
  
  
  The per-follower budget
&lt;/h2&gt;

&lt;p&gt;The hidden lever in any fanout system. You don't owe every follower the same delivery treatment.&lt;/p&gt;

&lt;p&gt;The naive read of "fanout-on-write" is that every follower's timeline gets every tweet. In production, that's wasteful. A follower who hasn't opened the app in three weeks doesn't need their Redis list updated in real time. They'll get a fresh feed assembled on demand when they come back.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fanout_to_followers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;followers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;social&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;followers_of&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;author_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;active&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;activity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter_active&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;followers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fid&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lpush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feed:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tweet_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ltrim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feed:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;799&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Inactive followers: rebuilt on next visit from
&lt;/span&gt;    &lt;span class="c1"&gt;# author timelines + a sliding window query.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two effects. Fanout cost shrinks to the active follower set, which is a fraction of total followers. Inactive users still get a coherent timeline when they return, because you assemble it on read from author timelines anyway. That's the same code path the celebrity case already needed. You wrote it once; reuse it.&lt;/p&gt;

&lt;p&gt;This is also where you tell the interviewer about the bound you'd put on it. Cap fanout at, say, the most recently active 500k followers per author. Above that cap, the account is treated as a celebrity for delivery, even if its raw follower count is below the threshold. The system has a worst-case-write SLO and you can defend it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the hybrid looks like end-to-end
&lt;/h2&gt;

&lt;p&gt;Three paths, picked at write time.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Author class&lt;/th&gt;
&lt;th&gt;Write path&lt;/th&gt;
&lt;th&gt;Read path&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Normal&lt;/td&gt;
&lt;td&gt;Fanout to active followers&lt;/td&gt;
&lt;td&gt;Read precomputed &lt;code&gt;feed:&amp;lt;uid&amp;gt;&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Borderline&lt;/td&gt;
&lt;td&gt;Fanout to active followers, capped&lt;/td&gt;
&lt;td&gt;Read precomputed feed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Celebrity&lt;/td&gt;
&lt;td&gt;Append to &lt;code&gt;author:&amp;lt;aid&amp;gt;&lt;/code&gt; only&lt;/td&gt;
&lt;td&gt;Merge feed + per-celeb tail&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The read code is the same in all three cases. It always tries the precomputed feed, then merges in celebrity tails for the celebrities you follow. Normal users follow zero celebrities and the merge is a no-op. Heavy users of celebrity content pay the merge once per timeline load, on a path that pipelines well.&lt;/p&gt;

&lt;p&gt;Two things to call out before the panel does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ordering on merge.&lt;/strong&gt; Tweet timestamps from different sources are not enough; clock skew between fanout writers and author-timeline writers will give you out-of-order feeds at the boundary. Sort by &lt;code&gt;(tweet_id, snowflake_time)&lt;/code&gt; where snowflake_time is monotonically assigned at the tweet store. The author-timeline path and the fanout path both write the same tweet ID, so the merge is deterministic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backpressure on celebrity reads.&lt;/strong&gt; If a celebrity tweets and 50 million followers open the app at once, the celebrity's &lt;code&gt;author:&amp;lt;aid&amp;gt;&lt;/code&gt; Redis key becomes the new hot key. Mitigate with a small in-memory cache at the read service (per-pod, 1-second TTL) so the per-celeb tail is fetched once per pod per second, not once per request. The math: a thousand pods, one second TTL, one Redis read per pod per second per hot author. That's 1000 reads/sec on the hottest key, which a Redis replica handles without breathing.&lt;/p&gt;

&lt;p&gt;For more depth on these trade-offs and the interview frame around them, the &lt;a href="https://www.amazon.com/dp/B0GX2SQ594" rel="noopener noreferrer"&gt;System Design Pocket Guide: Interviews&lt;/a&gt; walks through the Twitter design alongside 14 other system designs the same way: the cliché answer, the failure mode, and the version that earns the offer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to draw first
&lt;/h2&gt;

&lt;p&gt;Start with the threshold box on the whiteboard. One arrow into it labeled &lt;code&gt;post_tweet&lt;/code&gt;, two arrows out: one to fanout, one to author timeline. Then sketch the read service pulling from both and merging. The panel will ask one of three follow-ups next: how you'd promote an account across the threshold without a thundering-herd refresh, how you'd pick the active-follower window, or how the merge handles a celebrity who deletes a tweet. Have a one-paragraph answer for each ready, and the loop turns into a conversation about which dial to turn.&lt;/p&gt;




&lt;h2&gt;
  
  
  If this was useful
&lt;/h2&gt;

&lt;p&gt;The Twitter design is one of fifteen worked through end-to-end in the &lt;a href="https://www.amazon.com/dp/B0GX2SQ594" rel="noopener noreferrer"&gt;System Design Pocket Guide: Interviews&lt;/a&gt;. Each chapter follows the same shape: the obvious answer, the failure case the interviewer is hunting for, and the design that handles it. If you're prepping for a senior-or-above loop and want the version of these answers that doesn't sound like everyone else's, it's written for you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.amazon.com/dp/B0GX2SQ594" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6dw87uaq2vin2k1bwb0.jpg" alt="System Design Pocket Guide: Interviews" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>database</category>
      <category>tutorial</category>
      <category>programming</category>
    </item>
    <item>
      <title>Rebuild a CQRS Read Model With Zero Downtime</title>
      <dc:creator>Gabriel Anhaia</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:52:02 +0000</pubDate>
      <link>https://forem.com/gabrielanhaia/rebuild-a-cqrs-read-model-with-zero-downtime-27io</link>
      <guid>https://forem.com/gabrielanhaia/rebuild-a-cqrs-read-model-with-zero-downtime-27io</guid>
      <description>&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Book:&lt;/strong&gt; &lt;a href="https://www.amazon.com/dp/B0GX3B8371" rel="noopener noreferrer"&gt;Event-Driven Architecture Pocket Guide: Saga, CQRS, Outbox, and the Traps Nobody Warns You About&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;My project:&lt;/strong&gt; &lt;a href="https://hermes-ide.com" rel="noopener noreferrer"&gt;Hermes IDE&lt;/a&gt; | &lt;a href="https://github.com/hermes-hq/hermes-ide" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; — an IDE for developers who ship with Claude Code and other AI coding tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Me:&lt;/strong&gt; &lt;a href="https://xgabriel.com" rel="noopener noreferrer"&gt;xgabriel.com&lt;/a&gt; | &lt;a href="https://github.com/gabrielanhaia" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;You discover the bug at 2:14 on a Tuesday. The &lt;code&gt;orders_read&lt;/code&gt; projection has been flattening line items into a JSON blob for eighteen months, and the new "items shipped in last 30 days" dashboard needs to filter on SKU. The schema is wrong. There are 84 million rows. Rebuilding the projection from the event log takes about three hours on a warm replica. Your SLO budget for the quarter is 22 minutes and you've already burned 11.&lt;/p&gt;

&lt;p&gt;You can't take a four-hour outage to rebuild the read side. You also can't ship the new query against the wrong shape. This is the boring problem CQRS was supposed to make easy, and it is, once you stop thinking about the rebuild as an outage and start thinking about it as a parallel deployment. Two read models, one event log, a flag, and a parity check. That's the whole pattern.&lt;/p&gt;

&lt;p&gt;The pieces below assume Postgres for the read side and Kafka as the durable event log. The shape transfers to any pair where the log keeps full history and the read store can be re-populated from it. Marten's Critter Stack &lt;a href="https://jeremydmiller.com/2025/03/26/projections-consistency-models-and-zero-downtime-deployments-with-the-critter-stack/" rel="noopener noreferrer"&gt;shipped this in March 2025&lt;/a&gt; as a first-class blue-green; the same pattern works by hand if you don't have framework support.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shape of the rebuild
&lt;/h2&gt;

&lt;p&gt;Five steps, in this order. Skip none.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Stand up &lt;code&gt;orders_read_v2&lt;/code&gt; next to &lt;code&gt;orders_read_v1&lt;/code&gt;. New schema, new table, same database is fine.&lt;/li&gt;
&lt;li&gt;Dual-write the projection: every event the consumer handles updates both v1 and v2.&lt;/li&gt;
&lt;li&gt;Replay the historical events from the log into v2 only. v1 is already correct; don't touch it.&lt;/li&gt;
&lt;li&gt;Shadow-read: every query hits v1 to serve the response and v2 to compare. Log mismatches.&lt;/li&gt;
&lt;li&gt;When the mismatch rate is zero for a sustained window, flip the read flag to v2. Wait. Drop v1.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The replay and the live tail meet somewhere in the middle. The dual-write is what makes that meeting safe. Without it, events that arrive during the replay race with the historical events and you get gaps. With it, every event lands in v2 twice (once from replay, once from the live consumer) and your projection has to be idempotent. That's the price of admission and it's cheaper than an outage.&lt;/p&gt;

&lt;p&gt;Here are the two tables side by side. v1 is what you already have; v2 is the target.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- v1: the existing, wrong shape.&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders_read_v1&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;order_id&lt;/span&gt;      &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;   &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt;        &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;items_json&lt;/span&gt;    &lt;span class="n"&gt;jsonb&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;total_cents&lt;/span&gt;   &lt;span class="nb"&gt;bigint&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;updated_at&lt;/span&gt;    &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;last_event_id&lt;/span&gt; &lt;span class="nb"&gt;bigint&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- v2: the new, correct shape. Items get their own table so&lt;/span&gt;
&lt;span class="c1"&gt;-- you can filter on sku without jsonb gymnastics.&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders_read_v2&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;order_id&lt;/span&gt;      &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;customer_id&lt;/span&gt;   &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt;        &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;total_cents&lt;/span&gt;   &lt;span class="nb"&gt;bigint&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;shipped_at&lt;/span&gt;    &lt;span class="n"&gt;timestamptz&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;updated_at&lt;/span&gt;    &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;last_event_id&lt;/span&gt; &lt;span class="nb"&gt;bigint&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;order_items_read_v2&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;order_id&lt;/span&gt;   &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
             &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;orders_read_v2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;CASCADE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;line_no&lt;/span&gt;    &lt;span class="nb"&gt;int&lt;/span&gt;  &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;sku&lt;/span&gt;        &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;qty&lt;/span&gt;        &lt;span class="nb"&gt;int&lt;/span&gt;  &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_no&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;order_items_read_v2&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;last_event_id&lt;/code&gt; column on each row is the load-bearing detail. It's the offset of the last event applied to that aggregate. The projection writes it on every update, and the replay refuses to apply an event whose offset is less than or equal to the current &lt;code&gt;last_event_id&lt;/code&gt;. That's how you make the projection idempotent without bolting on a separate dedup table.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dual-write the projection
&lt;/h2&gt;

&lt;p&gt;Your consumer was already writing to v1. Add v2 in the same transaction. Both writes succeed or neither does. That's non-negotiable; otherwise the parity check below is meaningless.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# consumer.py: runs against the live tail of the Kafka topic.
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;confluent_kafka&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Consumer&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;eid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;offset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transaction&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="nf"&gt;apply_v1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;apply_v2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;apply_v2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OrderPlaced&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
          INSERT INTO orders_read_v2
            (order_id, customer_id, status,
             total_cents, updated_at, last_event_id)
          VALUES (%s, %s, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;placed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, %s, now(), %s)
          ON CONFLICT (order_id) DO UPDATE
          SET last_event_id = EXCLUDED.last_event_id
          WHERE orders_read_v2.last_event_id &amp;lt; EXCLUDED.last_event_id
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
              &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_cents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;eid&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
            &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
              INSERT INTO order_items_read_v2
                (order_id, line_no, sku, qty)
              VALUES (%s, %s, %s, %s)
              ON CONFLICT (order_id, line_no) DO NOTHING
            &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;WHERE orders_read_v2.last_event_id &amp;lt; EXCLUDED.last_event_id&lt;/code&gt; clause is the idempotency guard. An event that's already been applied (because the replay raced ahead of the live tail, or because Kafka redelivered) becomes a no-op. Run the same event ten times; the row settles at the same state.&lt;/p&gt;

&lt;h2&gt;
  
  
  The replay
&lt;/h2&gt;

&lt;p&gt;Replay reads from the beginning of the topic and writes only to v2. It uses a separate consumer group so it doesn't poison the live consumer's offsets. Run it on a worker pool. The throughput ceiling is Postgres write bandwidth, not Kafka.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# replay.py: one-shot job, separate consumer group.
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;confluent_kafka&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Consumer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TopicPartition&lt;/span&gt;

&lt;span class="n"&gt;cfg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bootstrap.servers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kafka:9092&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;group.id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders-read-v2-replay&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enable.auto.commit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto.offset.reset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;earliest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Consumer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nc"&gt;TopicPartition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;psycopg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DSN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;all_caught_up&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transaction&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="nf"&gt;apply_v2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two details that bite teams running this for the first time. First, batch your transactions. Committing every event drops throughput by 10x or more. Group 500 events per transaction and the replay finishes in a fraction of the time. Second, monitor the high-water mark of v2's &lt;code&gt;last_event_id&lt;/code&gt; against the topic head. When the gap stops shrinking, you're caught up. That's the signal to start shadow-reads, not a fixed timer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shadow-read for parity
&lt;/h2&gt;

&lt;p&gt;The query layer reads from v1 to serve the response. In the same request, asynchronously, it reads from v2 and compares. Mismatches go to a log with the event offsets that got us here.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# query_layer.py: flag-gated, instrumented.
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;v1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_v1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;FLAG&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders_read_v2_shadow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;v2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_v2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;equivalent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parity_mismatch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v2_shadow_failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;equivalent&lt;/code&gt; function is where the work hides. v1's &lt;code&gt;items_json&lt;/code&gt; is unordered; v2's &lt;code&gt;order_items_read_v2&lt;/code&gt; is ordered by &lt;code&gt;line_no&lt;/code&gt;. Normalize both sides before comparing. Ignore &lt;code&gt;updated_at&lt;/code&gt; skew within a few hundred milliseconds. Don't compare &lt;code&gt;last_event_id&lt;/code&gt;; it will legitimately differ between the two if v2 caught a later event the live consumer hasn't written to v1 yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  The parity-check script
&lt;/h2&gt;

&lt;p&gt;Shadow-read covers what users query. It does not cover the long tail of orders nobody opened in the last week. Run a sweep.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# parity_check.py: paginates through orders, flags drift.
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;BATCH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;after&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
      SELECT v1.order_id, v1.status, v1.total_cents,
             v1.items_json, v2.status, v2.total_cents,
             COALESCE(json_agg(json_build_object(
               &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, i.sku, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;qty&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, i.qty)), &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;::json)
      FROM orders_read_v1 v1
      JOIN orders_read_v2 v2 USING (order_id)
      LEFT JOIN order_items_read_v2 i USING (order_id)
      WHERE v1.order_id &amp;gt; %s
      GROUP BY v1.order_id, v2.order_id
      ORDER BY v1.order_id
      LIMIT %s
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;after&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BATCH&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;drift&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;oid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s1&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;s2&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;drift&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;header&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;drift&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;drift&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the sweep nightly during the rebuild. The drift count should fall to zero and stay there. If it doesn't, you have a bug in &lt;code&gt;apply_v2&lt;/code&gt; and the cutover waits. The most common cause is an event type the v2 projection forgot to handle: &lt;code&gt;OrderRefunded&lt;/code&gt;, &lt;code&gt;OrderItemReplaced&lt;/code&gt;, the one that gets emitted twice a month. Your test suite missed it because your test corpus is two weeks of fixtures.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cutover
&lt;/h2&gt;

&lt;p&gt;A flag flip, a hold, a drop. In that order.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# At T=0: flip read traffic to v2.
&lt;/span&gt;&lt;span class="n"&gt;FLAG&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders_read_source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# At T+24h: stop dual-writing if parity has held.
&lt;/span&gt;&lt;span class="n"&gt;FLAG&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders_read_dual_write&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# At T+7d: drop v1.
# DROP TABLE orders_read_v1;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 24-hour hold is for rollback. If a v2 query returns garbage for a customer that nobody saw during shadow-read (a regional edge case, a legacy aggregate from 2019), you flip the flag back to v1 and v1 still has the latest state because dual-write was on. The seven-day hold before the drop is for the auditor who shows up on day five asking why the report shape changed. Keep v1 readable until that conversation is done.&lt;/p&gt;

&lt;p&gt;The pattern compresses well in your head. Two tables, one transaction, one replay, one parity sweep, one flag with three positions. The work that takes a week is rarely the code; it's the schema migration, the equivalence function, and the long tail of event types the v1 projection silently absorbed and the v2 projection has to reproduce. Plan the calendar around that, not the replay job.&lt;/p&gt;




&lt;h2&gt;
  
  
  If this was useful
&lt;/h2&gt;

&lt;p&gt;Chapter 6 of &lt;a href="https://www.amazon.com/dp/B0GX3B8371" rel="noopener noreferrer"&gt;&lt;em&gt;Event-Driven Architecture Pocket Guide&lt;/em&gt;&lt;/a&gt; walks the CQRS read-side patterns end to end: projection idempotency, replay tooling, the parity-check checklist this post compresses, and the failure modes that show up around the dual-write window. If you're staring at a read model you can't afford to rebuild in place, it's the chapter to read first.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.amazon.com/dp/B0GX3B8371" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fblk678n27i2r4chdp8zh.jpg" alt="Event-Driven Architecture Pocket Guide" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>eventdriven</category>
      <category>architecture</category>
      <category>postgres</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>The Two Joys of Coding Before AI</title>
      <dc:creator>Viktor Lázár</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:51:28 +0000</pubDate>
      <link>https://forem.com/lazarv/the-two-joys-of-coding-before-ai-1pbp</link>
      <guid>https://forem.com/lazarv/the-two-joys-of-coding-before-ai-1pbp</guid>
      <description>&lt;p&gt;There is a particular kind of grief floating around right now. You see it in blog posts, in conference talks, in late-night threads: a mourning for the joy of coding before AI. People describe it as if a forest has been paved over. Something they loved is gone, and something colder has taken its place.&lt;/p&gt;

&lt;p&gt;I think most of these conversations talk past each other because they skip the only question that matters: &lt;strong&gt;what was the joy actually made of?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"Coding" is not one activity. It is at least two, braided together so tightly that for decades nobody had to separate them. AI pulls on one of those strands and not the other, and whether that feels like loss or liberation depends entirely on which strand you were holding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two kinds of joy
&lt;/h2&gt;

&lt;p&gt;Strip a programming session down to its emotional core and you find two distinct rewards.&lt;/p&gt;

&lt;p&gt;The first is &lt;strong&gt;the joy of materializing a vision&lt;/strong&gt;. You see something in your head — a tool, an interface, a system, a small clever thing that does not exist yet — and you bring it into the world. The pleasure here is in the gap closing. The thing in your imagination and the thing on the screen converge until they are the same thing. The keyboard, the language, the build system: these are friction. Necessary friction, often beautiful friction, but friction. The joy lives at the moment of arrival.&lt;/p&gt;

&lt;p&gt;The second is &lt;strong&gt;the joy of figuring something out&lt;/strong&gt;. A problem resists you. You sit with it. You pull on threads, build mental models, get them wrong, refine them, and somewhere — sometimes in the shower, sometimes at 2 a.m., sometimes mid-sentence in a meeting — the shape of the answer clicks into place. The pleasure here is not in arrival but in the act of comprehension itself. You understand something now that you did not understand an hour ago, and your brain rewards you for it the way it rewards eating when you are hungry.&lt;/p&gt;

&lt;p&gt;These are not the same feeling. They use different muscles. They satisfy different hungers. And — this is the important part — they leave behind different kinds of memory. A vision-materializer remembers what they built. A problem-solver remembers how the world bent into a new shape inside their head.&lt;/p&gt;

&lt;p&gt;Most working programmers feel both, in different proportions, sometimes in the same hour. But if you ask honestly which one is the &lt;em&gt;core&lt;/em&gt; — the thing that made you a programmer rather than something else — almost everyone has an answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI actually changes
&lt;/h2&gt;

&lt;p&gt;AI coding assistants are extraordinary materializers. That is what they are built to be. You describe the thing, they produce the thing. The friction between vision and artifact collapses dramatically. What used to be an afternoon of plumbing is now a paragraph and a review pass.&lt;/p&gt;

&lt;p&gt;If your joy was in the materialization — in seeing the thing exist — AI is not stealing anything from you. It is &lt;strong&gt;giving you more of what you loved&lt;/strong&gt;. The gap closes faster, which means you can close more gaps, which means more visions per unit of life. The friction you tolerated was never the source of the joy; the arrival was. You can build the second thing now, and the third, and the weird side project you never had time for. The hands-on craft loss is real but it is a craft loss, not a joy loss. You can still write the loop by hand on a Saturday if you want to. Nothing stops you.&lt;/p&gt;

&lt;p&gt;One thing has to be said clearly here, because it is the most common bad-faith reading of the materializer position: &lt;strong&gt;materialization is not "whatever shipped, shipped."&lt;/strong&gt; A vision is not just a silhouette of a feature; it has internal coherence, a way it behaves under pressure, a quality it carries. A materializer who accepts slop because it superficially resembles the artifact in their head has not closed the gap — they have moved the goalposts to meet the output. That is not the joy of materializing a vision. That is the relief of being done. They are different feelings, and conflating them is how teams end up shipping confident-sounding garbage at unprecedented speed. The AI gives you a draft. The work — the actual materializer's work — is to keep pushing the draft until it matches the thing in your head, including the parts of the thing in your head that have to do with correctness, taste, performance, and how it will read to the next person. Acceleration is acceleration toward the &lt;em&gt;right&lt;/em&gt; artifact, not toward any artifact. A materializer who forgets this is no longer practicing their craft; they are just hitting accept.&lt;/p&gt;

&lt;p&gt;If your joy was in the figuring out, the picture is genuinely different — and the grief is genuinely earned. The AI is not just removing friction; it is removing &lt;strong&gt;the problem itself&lt;/strong&gt;. The puzzle you would have sat with for three days, turning it over in your head on the train and in the shower, building the mental model that becomes part of how you think forever — the AI hands you an answer in twelve seconds. The answer is often correct. And the comprehension that would have grown in you while you struggled does not grow, because you did not struggle.&lt;/p&gt;

&lt;p&gt;This is not a complaint about laziness or skill atrophy, though those are real concerns. It is something more specific: a category of human pleasure, the pleasure of &lt;em&gt;understanding something hard&lt;/em&gt;, requires the hardness. Remove the hardness and you remove the pleasure, even if you keep the answer. You cannot have the satisfaction of a crossword without the crossword.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the debate is so confused
&lt;/h2&gt;

&lt;p&gt;Once you see this split, the public conversation about AI and coding starts to make more sense. The two camps are not actually disagreeing about AI. They are reporting honestly on two different inner experiences.&lt;/p&gt;

&lt;p&gt;The "AI is wonderful, I ship five times faster" camp is overwhelmingly populated by materializers. They are telling the truth. Their joy is intact and amplified.&lt;/p&gt;

&lt;p&gt;The "AI is hollowing out my craft" camp is overwhelmingly populated by problem-solvers. They are also telling the truth. Their joy is, in fact, being eroded — not by malice or hype, but by the specific mechanism of having the puzzles solved before they get to play with them.&lt;/p&gt;

&lt;p&gt;When these two groups argue, they sound like they are arguing about a tool. They are actually arguing about which of two pleasures is the real one. There is no answer to that question, because there are two answers, and they are both correct for the person giving them.&lt;/p&gt;

&lt;p&gt;Notice how each camp uses the same phrase to dismiss the other's grief. To the materializer, the problem-solving was always an &lt;em&gt;implementation detail&lt;/em&gt; — a means, a tax you paid on the way to the artifact, something a sufficiently advanced tool was supposed to absorb eventually. To the problem-solver, the shipped artifact was the implementation detail — the residue, the visible echo of an internal event that had already happened in their head. Each side, in good faith, treats the other side's joy as the boring scaffolding around their own. That is why the conversation goes nowhere: both sides are correctly identifying what is, &lt;em&gt;for them&lt;/em&gt;, incidental.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do about it
&lt;/h2&gt;

&lt;p&gt;If you are mourning, the first useful move is to ask yourself the honest version of the question. Not "do I miss coding," but: &lt;em&gt;which kind of coding?&lt;/em&gt; When you replay the moments you would call joyful, are you watching yourself ship the thing, or are you watching yourself understand the thing? The answer tells you whether you are facing an opportunity or a loss.&lt;/p&gt;

&lt;p&gt;For materializers, the path is mostly forward. Use the tools. Build more. The thing you loved is more available now, not less.&lt;/p&gt;

&lt;p&gt;For problem-solvers, the answer is harder and more deliberate. The puzzles still exist; they have just stopped arriving on their own. Production code paths now route around them. To keep the joy, you have to &lt;strong&gt;choose the friction back in&lt;/strong&gt; — pick problems the AI cannot solve cleanly, work in domains where the model is weak, build from scratch on weekends, read papers, do the leetcode-equivalent that is actually interesting to you, contribute to runtimes and compilers and other places where the problem space is still deep enough that no autocomplete can shortcut it. The protected hour where you do not ask the assistant is not a Luddite stance; it is a deliberate preservation of the conditions your joy requires.&lt;/p&gt;

&lt;p&gt;Both responses are healthy. Both are grown-up. What is not healthy is conflating them — using a materializer's optimism to dismiss a problem-solver's grief, or using a problem-solver's grief to deny a materializer's genuine, earned acceleration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing under the thing
&lt;/h2&gt;

&lt;p&gt;The deeper claim hiding inside all of this is that &lt;em&gt;coding was never one thing&lt;/em&gt;. It was a workbench where two very different human pleasures happened to use the same tools. The industry treated them as one because the workflow forced them to be one — you could not materialize a vision without solving a hundred small problems along the way, and you could not solve interesting problems without something to materialize them into.&lt;/p&gt;

&lt;p&gt;AI is the first force strong enough to pull those two pleasures apart. It is doing so cleanly and without asking permission. What we are watching is not the death of the joy of coding. It is the unbundling of two joys that were always separate, finally being forced to admit it.&lt;/p&gt;

&lt;p&gt;Which one was yours?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Reducing AI Latency Through Smarter Model Routing and Token Optimization</title>
      <dc:creator>TokensAndTakes</dc:creator>
      <pubDate>Mon, 27 Apr 2026 19:49:08 +0000</pubDate>
      <link>https://forem.com/tokensandtakes/reducing-ai-latency-through-smarter-model-routing-and-token-optimization-4l0g</link>
      <guid>https://forem.com/tokensandtakes/reducing-ai-latency-through-smarter-model-routing-and-token-optimization-4l0g</guid>
      <description>&lt;p&gt;If you are working on AI speed and latency, this guide gives a simple, practical path you can apply today. Every 100 milliseconds of latency costs businesses real revenue. In AI systems, slow response times do more than frustrate users they break workflows, reduce engagement, and limit what applications can actually achieve. Most teams try to solve this problem by throwing more GPUs at it. That approach works, but it becomes expensive very quickly. The better path is to work smarter with the system you already have.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbb8chsj0blpjoynbrj5r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbb8chsj0blpjoynbrj5r.png" alt=" "&gt;&lt;/a&gt;&lt;br&gt;
Latency in LLM applications typically comes from three main sources: model selection, request queuing, and token generation. Each of these represents an opportunity for optimization. When every request is sent to the largest model, compute is wasted on simple queries that could easily be handled by smaller, faster alternatives. When requests are queued inefficiently, additional wait time is introduced before processing even begins. And when token generation is not optimized, unnecessary milliseconds are spent on every response. &lt;/p&gt;

&lt;p&gt;MegaLLM addresses these bottlenecks through intelligent model routing. Instead of defaulting every prompt to the most powerful model, it evaluates query complexity and routes it accordingly. A simple classification task might be handled by a lightweight model responding in around 200 milliseconds, while more complex reasoning tasks are sent to larger models. This routing happens automatically and can reduce average response times significantly in production environments.&lt;/p&gt;

&lt;p&gt;Batching is another often overlooked optimization lever. Many inference systems process requests one at a time, leaving GPU capacity underutilized. Dynamic batching groups incoming requests together, filling those idle gaps without introducing noticeable delay. The key is tuning the batch window correctly if it is too short, grouping opportunities are missed; if too long, additional latency is introduced. Well-optimized systems often achieve substantial throughput gains with proper batching strategies.&lt;/p&gt;

&lt;p&gt;Token optimization is equally important. Every token in both the prompt and the response adds to processing time. MegaLLM incorporates techniques like automatic prompt compression to remove redundant context while preserving meaning, and response caching to serve repeated queries instantly from memory. Together, these approaches can significantly reduce token processing overhead without impacting the user experience.&lt;/p&gt;

&lt;p&gt;The financial impact of these optimizations is substantial. A system handling millions of requests per month at lower latency requires far fewer resources than one operating inefficiently. Faster response times also unlock new use cases real-time conversational AI, interactive coding assistants, and live customer support systems become viable only when latency drops below human perception thresholds.&lt;/p&gt;

&lt;p&gt;For engineering teams, the key takeaways are clear: route requests based on complexity rather than defaulting to maximum capacity, implement dynamic batching with carefully tuned windows, compress prompts and cache responses to reduce token overhead, and measure latency at higher percentiles to capture real user experience. Ultimately, speed in AI systems is not just about hardware it is about making smarter decisions at every layer of the architecture. Teams that optimize routing, batching, and token handling will consistently outperform those that rely solely on increasing compute power.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhrvm81cb3h6mnv3mu6nc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhrvm81cb3h6mnv3mu6nc.png" alt=" "&gt;&lt;/a&gt;&lt;br&gt;
Key points: - Every 100 milliseconds of latency costs businesses real revenue , In AI systems, slow response times do more than frustrate users so they break workflows, reduce engagement, and limit what applications can.&lt;/p&gt;

&lt;p&gt;Disclosure: This article references &lt;a href="https://megallm.io?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=launch" rel="noopener noreferrer"&gt;MegaLLM &lt;/a&gt;as one example platform.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
