<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Hugo Vantighem</title>
    <description>The latest articles on Forem by Hugo Vantighem (@hugo_vantighem).</description>
    <link>https://forem.com/hugo_vantighem</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3947999%2Fe0a53736-f880-455e-9965-c3b4bf950342.jpg</url>
      <title>Forem: Hugo Vantighem</title>
      <link>https://forem.com/hugo_vantighem</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/hugo_vantighem"/>
    <language>en</language>
    <item>
      <title>Read Modify Write Is Where NoSQL Concurrency Bugs Begin.</title>
      <dc:creator>Hugo Vantighem</dc:creator>
      <pubDate>Sun, 24 May 2026 12:11:46 +0000</pubDate>
      <link>https://forem.com/hugo_vantighem/read-modify-write-is-where-nosql-concurrency-bugs-begin-1ala</link>
      <guid>https://forem.com/hugo_vantighem/read-modify-write-is-where-nosql-concurrency-bugs-begin-1ala</guid>
      <description>&lt;p&gt;&lt;em&gt;Part 1 of 3 — the single-document case.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There's a class of bug that every backend engineer ships at least once, usually&lt;br&gt;
without noticing for months. It hides inside the most innocent-looking operation:&lt;br&gt;
read a document, decide something, write it back.&lt;/p&gt;

&lt;p&gt;Take a concrete invariant: &lt;em&gt;a team can hold at most 10 seats.&lt;/em&gt; To add a seat you&lt;br&gt;
read the team document, count the seats, check &lt;code&gt;count &amp;lt; 10&lt;/code&gt;, and write. A textbook&lt;br&gt;
&lt;strong&gt;Read → Modify → Write&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now run it twice at the same instant. Request A reads &lt;code&gt;count = 9&lt;/code&gt;, decides "9 &amp;lt; 10,&lt;br&gt;
fine", and writes 10. Request B, a millisecond apart, also read &lt;code&gt;count = 9&lt;/code&gt;,&lt;br&gt;
decided "fine", and writes 10. You now have a team that thinks it has 10 seats but&lt;br&gt;
actually granted 11. Neither request did anything wrong on its own. One write&lt;br&gt;
silently erased the premise of the other. This is a &lt;strong&gt;lost update&lt;/strong&gt;, and it's the&lt;br&gt;
core anomaly of the single-document case.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;T0   A reads count = 9
T1   B reads count = 9
T2   A writes count = 10   ("9 &amp;lt; 10, fine")
T3   B writes count = 10   ("9 &amp;lt; 10, fine")

Reality:        11 seats granted
Database state: 10
Invariant:      violated, silently
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what teams actually reach for, and exactly what each option leaves on the&lt;br&gt;
table.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fat aggregate (atomic operators)
&lt;/h2&gt;

&lt;p&gt;If you can express the whole mutation as a single atomic operator — &lt;code&gt;$inc&lt;/code&gt;,&lt;br&gt;
&lt;code&gt;$push&lt;/code&gt; with &lt;code&gt;$slice&lt;/code&gt;, or a conditional &lt;code&gt;findAndModify&lt;/code&gt; — MongoDB applies it&lt;br&gt;
atomically on the document. There's no read-then-write window, so no lost update.&lt;br&gt;
For invariants that fit a single atomic expression, this is genuinely the right&lt;br&gt;
tool, and you should reach for it first.&lt;/p&gt;

&lt;p&gt;The catch: not every invariant fits. The moment your check needs branching ("if&lt;br&gt;
the plan is free &lt;em&gt;and&lt;/em&gt; count ≥ 5, reject") you're back to reading, deciding in&lt;br&gt;
application code, and writing — and the window reopens. Embedding related data is&lt;br&gt;
a perfectly good modeling choice; the trap is different. It's the temptation to keep&lt;br&gt;
stretching one document's &lt;em&gt;consistency boundary&lt;/em&gt; — folding in unrelated rules just&lt;br&gt;
to keep the write atomic — which is exactly how you end up with 16 MB documents and a saturated network.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Anomaly status: ✅ lost update handled — for the subset of rules expressible as&lt;br&gt;
one atomic op.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The pessimistic lock (Redis)
&lt;/h2&gt;

&lt;p&gt;Grab a distributed lock before the read, release after the write. It works — but&lt;br&gt;
for a single document it's a sledgehammer. You've added a network round-trip, a&lt;br&gt;
brand-new failure mode (the lock service), and a whole class of distributed&lt;br&gt;
coordination failures — lease expiry, lock drift, fencing, split-brain — all to&lt;br&gt;
guard one document the database could have guarded itself.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Anomaly status: ✅ everything — at the cost of latency and distributed coordination&lt;br&gt;
failures. (Part 3 is dedicated to why that bill is steep.)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimistic locking (a version field)
&lt;/h2&gt;

&lt;p&gt;Carry a &lt;code&gt;version&lt;/code&gt; on the document. Read it, run your logic, then write with a&lt;br&gt;
guard: &lt;code&gt;findAndModify({_id, version: v}, {$set: {...}, $inc: {version: 1}})&lt;/code&gt;. If&lt;br&gt;
anyone wrote in between, &lt;code&gt;version&lt;/code&gt; moved, your guard matches nothing, and you&lt;br&gt;
retry. This is the clean default for single-document RMW that doesn't fit an&lt;br&gt;
atomic operator — it kills lost update with no external system.&lt;/p&gt;

&lt;p&gt;The catch: under contention it's a retry machine. The more concurrent writers, the&lt;br&gt;
more losers re-run their logic, burning CPU and tail latency.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Anomaly status: ✅ lost update — at the cost of app-side retries.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Pray
&lt;/h2&gt;

&lt;p&gt;Bet that two requests never touch the same document in the same millisecond. They&lt;br&gt;
will. &lt;em&gt;Anomaly status: ❌ lost update, in production, at 3 a.m.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The point
&lt;/h2&gt;

&lt;p&gt;For a single document, you're actually well served: atomic operators or optimistic&lt;br&gt;
locking close the gap cleanly, without external machinery. The single-document&lt;br&gt;
case is the &lt;em&gt;easy&lt;/em&gt; one.&lt;/p&gt;

&lt;p&gt;The real pain begins the instant your invariant spans &lt;strong&gt;two&lt;/strong&gt; documents — a&lt;br&gt;
workspace budget gating a user debit, for example. There, optimistic locking stops&lt;br&gt;
being &lt;em&gt;sufficient&lt;/em&gt;: it still guards each document on its own, but it can no longer&lt;br&gt;
guarantee an invariant that lives &lt;em&gt;between&lt;/em&gt; them. And a nastier anomaly walks in —&lt;br&gt;
the database stays perfectly "consistent" while your business invariant quietly&lt;br&gt;
dies.&lt;/p&gt;

&lt;p&gt;Welcome to &lt;strong&gt;write skew&lt;/strong&gt;. That's part 2.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>distributedsystems</category>
      <category>mongodb</category>
      <category>nosql</category>
    </item>
    <item>
      <title>Postgres-grade Serializable at 20k+ ops/s — on a laptop. Don’t try this at home.</title>
      <dc:creator>Hugo Vantighem</dc:creator>
      <pubDate>Sat, 23 May 2026 17:14:52 +0000</pubDate>
      <link>https://forem.com/hugo_vantighem/postgres-grade-serializable-at-20k-opss-on-a-laptop-dont-try-this-at-home-f27</link>
      <guid>https://forem.com/hugo_vantighem/postgres-grade-serializable-at-20k-opss-on-a-laptop-dont-try-this-at-home-f27</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;They didn't know it was impossible, so they did it.&lt;/em&gt; — Mark Twain&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the software industry, we've been raised with a dogma: you must choose between &lt;strong&gt;Massive Performance&lt;/strong&gt; (NoSQL, eventual consistency) and &lt;strong&gt;Domain Rigor&lt;/strong&gt; (SQL, strong consistency, serializable).&lt;/p&gt;

&lt;p&gt;We are told that locks, latencies, and ACID properties are the natural enemies of speed. That if you want to scale, you have to let go of your business invariants.&lt;/p&gt;

&lt;p&gt;I decided to test another hypothesis. And I broke the myth.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result: 20,000+ Validated Transactions per Second
&lt;/h2&gt;

&lt;p&gt;This isn't a "fire and forget" ingestion log.&lt;/p&gt;

&lt;p&gt;This isn't a volatile cache experiment.&lt;/p&gt;

&lt;p&gt;What you see here is &lt;strong&gt;Business Transaction Durability&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Invariants validated&lt;/strong&gt; — every business rule is checked before commit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State persisted&lt;/strong&gt; — every change is durably written to disk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strong Consistency&lt;/strong&gt; — Serializable-level isolation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At 20,000+ ops/s, we are not just talking about speed. We are talking about the ability to maintain &lt;strong&gt;absolute domain integrity under massive load&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And the kicker: this is running on a &lt;strong&gt;MacBook Air M3&lt;/strong&gt; — 8 cores, 16 GB of RAM, the same machine I write the code on. No 64-core server. No NVMe array. No datacenter rack. One laptop, fan barely audible, doing the work of a small cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why General-Purpose Databases Hit a Ceiling
&lt;/h2&gt;

&lt;p&gt;Most databases are built for general cases. They treat every row the same way because they don't know your business.&lt;/p&gt;

&lt;p&gt;This &lt;strong&gt;"Domain Ignorance"&lt;/strong&gt; leads to generic row locks, MVCC bookkeeping, cross-table coordination, and massive overhead — costs you pay on &lt;em&gt;every single transaction&lt;/em&gt;, whether your domain needs them or not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Not Magic — Discipline
&lt;/h2&gt;

&lt;p&gt;For the skeptics: this isn't sorcery. It's discipline applied to the right layer — designing the system so the hardware does exactly what it's good at, and nothing else.&lt;/p&gt;

&lt;p&gt;I'm not reinventing the storage wheel. The foundation is &lt;strong&gt;Pebble&lt;/strong&gt;, the same proven LSM-tree engine that powers CockroachDB. But the engine is just the floor. The real lever is the &lt;strong&gt;orchestration of the domain logic on top of it&lt;/strong&gt; — and that's what Part 2 puts a name on.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Note on the Benchmark Scope
&lt;/h2&gt;

&lt;p&gt;I know what you're thinking. &lt;em&gt;"20k+ ops/s? That must be an internal memory trick."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It isn't. To ensure these numbers reflect real-world usage, the benchmark covers the &lt;strong&gt;entire lifecycle&lt;/strong&gt; of a business transaction:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Client-side serialization&lt;/strong&gt; — the payload starts from the app.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local communication&lt;/strong&gt; — end-to-end roundtrip.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Server-side deserialization &amp;amp; parsing.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Business Invariants validation.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disk persistence&lt;/strong&gt; with full durability guarantees — &lt;code&gt;fsync&lt;/code&gt; on every commit.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The workload: &lt;code&gt;batch=1000&lt;/code&gt;, &lt;code&gt;payload=1KB&lt;/code&gt;, single-node, single laptop. Here's the run, with the system-level disk stats captured live during the bench:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;[23755.87 items/s] | items=1424000 | batch=1000 | payload=1KB | durability=FSYNC-ON
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv5i5y85as7v7b7dk0kc9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv5i5y85as7v7b7dk0kc9.png" alt=" " width="800" height="516"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Live capture during the bench (batch=1000, 1KB, fsync ON). Disk on fire, CPU bored.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Two things jump out of that stats panel — and together they're the whole point:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The disk is screaming.&lt;/strong&gt; Sustained 100–200 MB/s with the ⚡ markers firing almost every second. This is real &lt;code&gt;fsync&lt;/code&gt;'d traffic hitting the SSD, not a memory cache pretending to be durable. If you pulled the power cord mid-run, every committed transaction would still be there on reboot.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The CPU is bored&lt;/strong&gt; (~18% on an 8-core M3). The compute is idle while the disk pegs out — that asymmetry is the whole story.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And this isn't the ceiling. With bigger batches the same laptop pushes further; even at &lt;code&gt;batch=1&lt;/code&gt;, it doesn't fall off a cliff. The full envelope is Part 2.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;This is just Part 1. In a few days, &lt;strong&gt;Part 2&lt;/strong&gt; finishes the picture and lands the real punchline: business rules aren't a tax on performance — they're the contract that lets the machine fly. And the whole thing runs on hardware your team could expense, not a cloud bill that needs board approval.&lt;/p&gt;

&lt;p&gt;Stay tuned. The era of the "Impossible Trade-off" is over.&lt;/p&gt;

</description>
      <category>database</category>
      <category>performance</category>
      <category>postgres</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
