<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Do Pham Dinh</title>
    <description>The latest articles on Forem by Do Pham Dinh (@xidoke).</description>
    <link>https://forem.com/xidoke</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1027650%2Fbbff8d5a-1ed9-4eb3-b0c8-dc74abd8304b.jpeg</url>
      <title>Forem: Do Pham Dinh</title>
      <link>https://forem.com/xidoke</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/xidoke"/>
    <language>en</language>
    <item>
      <title>The race condition a stress test found in my double-entry ledger — and how I fixed it</title>
      <dc:creator>Do Pham Dinh</dc:creator>
      <pubDate>Sun, 24 May 2026 13:56:53 +0000</pubDate>
      <link>https://forem.com/xidoke/the-race-condition-a-stress-test-found-in-my-double-entry-ledger-and-how-i-fixed-it-b5o</link>
      <guid>https://forem.com/xidoke/the-race-condition-a-stress-test-found-in-my-double-entry-ledger-and-how-i-fixed-it-b5o</guid>
      <description>&lt;p&gt;I'm building &lt;a href="https://github.com/xidoke/ledger-service" rel="noopener noreferrer"&gt;ledger-service&lt;/a&gt;, a double-entry e-wallet ledger in Java 21 / Spring Boot 3.5 / PostgreSQL. It's &lt;a href="https://ledger-service-bjzr.onrender.com" rel="noopener noreferrer"&gt;live on Render&lt;/a&gt;. Early on I wrote a stress test that fires 50 transfers at the &lt;strong&gt;same account&lt;/strong&gt; at once and asserts the books are never corrupted. It went red — and the way it went red is the most useful thing I've learned building this.&lt;/p&gt;

&lt;p&gt;This post walks the whole chain: why a money ledger keeps a balance &lt;em&gt;cache&lt;/em&gt; at all, the read-modify-write race that cache invites, how I detected it, the fix (optimistic locking + bounded retry), the benchmark that justified choosing optimistic over pessimistic locking, and how idempotency has to compose with retry so a network hiccup never double-spends.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup: a ledger, and why it has a cache
&lt;/h2&gt;

&lt;p&gt;The source of truth is a &lt;strong&gt;double-entry, append-only&lt;/strong&gt; table (ADR-0005). Every money operation writes at least one balanced &lt;code&gt;DEBIT&lt;/code&gt;/&lt;code&gt;CREDIT&lt;/code&gt; pair where &lt;code&gt;Σ DEBIT == Σ CREDIT&lt;/code&gt;, and &lt;code&gt;ledger_entries&lt;/code&gt; is insert-only — no &lt;code&gt;UPDATE&lt;/code&gt;, no &lt;code&gt;DELETE&lt;/code&gt;. A mistake is fixed by posting a &lt;em&gt;correcting&lt;/em&gt; entry, never by editing history. This is the model Stripe, Modern Treasury, and Formance all use, and it's what gives you an audit trail you can trust.&lt;/p&gt;

&lt;p&gt;But "what is account X's balance?" should not be a &lt;code&gt;SUM&lt;/code&gt; over every entry that account has ever had. So I keep a &lt;strong&gt;cache&lt;/strong&gt;: &lt;code&gt;accounts.balance&lt;/code&gt; is a materialized &lt;code&gt;Σ&lt;/code&gt; of that account's entries, updated &lt;em&gt;in the same transaction&lt;/em&gt; as the entries themselves (ADR-0006). The entries are the truth; the balance is a derived read cache that stays O(1).&lt;/p&gt;

&lt;p&gt;That cache is exactly where concurrency bites.&lt;/p&gt;

&lt;h2&gt;
  
  
  The race
&lt;/h2&gt;

&lt;p&gt;Two requests debit the same account at the same time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;R1: read balance $500 (enough)      R2: read balance $500 (enough)
R1: commit −$300 → $200             R2: commit −$400 → −$200   ← overdraft / lost update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both read &lt;code&gt;$500&lt;/code&gt;, both decide they have enough, both write back their own idea of the new balance. One write silently clobbers the other: a &lt;strong&gt;lost update&lt;/strong&gt;, and a balance that no longer matches the ledger entries underneath it.&lt;/p&gt;

&lt;p&gt;The trap is assuming the database stops this for you. It does not. PostgreSQL's default isolation level, &lt;code&gt;READ COMMITTED&lt;/code&gt;, only guarantees you don't read &lt;em&gt;uncommitted&lt;/em&gt; data — it does nothing about two transactions that each read-then-write the same row concurrently. A read-modify-write race sails right through it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Detecting it: the stress test
&lt;/h2&gt;

&lt;p&gt;Here's the test that surfaced the bug. Fund one account, then fire &lt;code&gt;N = 50&lt;/code&gt; transfers out of it concurrently and check the books afterward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;AtomicInteger&lt;/span&gt; &lt;span class="n"&gt;successes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AtomicInteger&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="nc"&gt;CountDownLatch&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CountDownLatch&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;ExecutorService&lt;/span&gt; &lt;span class="n"&gt;pool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Executors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newFixedThreadPool&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;N&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;submit&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;await&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;                       &lt;span class="c1"&gt;// line them all up...&lt;/span&gt;
        &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/transfers"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;AMOUNT&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// ...then fire at once&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;201&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;successes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;incrementAndGet&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;});&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;countDown&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// after all complete:&lt;/span&gt;
&lt;span class="n"&gt;assertThat&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;balanceCache&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;isEqualTo&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ledgerBalance&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;   &lt;span class="c1"&gt;// cache == Σ entries&lt;/span&gt;
&lt;span class="n"&gt;assertThat&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;balanceCache&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;isGreaterThanOrEqualTo&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;        &lt;span class="c1"&gt;// no overdraft&lt;/span&gt;
&lt;span class="n"&gt;assertThat&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;balanceCache&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;isEqualTo&lt;/span&gt;&lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;AMOUNT&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;      &lt;span class="c1"&gt;// exact accounting&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The assertions are deliberately &lt;strong&gt;timing-independent&lt;/strong&gt; — they hold for &lt;em&gt;any&lt;/em&gt; split of successes and failures, because they compare the cache against the ledger truth rather than against a fixed expected count. That's what makes the test a stable regression guard instead of a flaky one.&lt;/p&gt;

&lt;p&gt;I confirmed the bug by experiment: with the &lt;code&gt;@Version&lt;/code&gt; column removed, &lt;strong&gt;~85% of the cache updates were lost&lt;/strong&gt; and these assertions went red — the cached balance drifted far from the sum of the entries. The cache and the truth disagreed, which in a money system is the whole ballgame.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix, part 1: optimistic locking
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;accounts&lt;/code&gt; already had a &lt;code&gt;version BIGINT&lt;/code&gt; column, because the Account is the aggregate / locking boundary (ADR-0010). Mapping it as a JPA &lt;code&gt;@Version&lt;/code&gt; turns every balance write into a compare-and-set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;version&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
 &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two concurrent writers both load version &lt;code&gt;7&lt;/code&gt;. The first to commit sets it to &lt;code&gt;8&lt;/code&gt;. The second's &lt;code&gt;UPDATE ... WHERE version = 7&lt;/code&gt; now matches &lt;strong&gt;zero rows&lt;/strong&gt;, and Hibernate raises &lt;code&gt;OptimisticLockingFailureException&lt;/code&gt; at commit time. The lost update is now impossible: instead of silently clobbering, the loser is &lt;em&gt;told&lt;/em&gt; it lost.&lt;/p&gt;

&lt;p&gt;The key property: this is &lt;strong&gt;detection, not blocking&lt;/strong&gt;. No reader ever waits for a lock. For a ledger — where balance/history reads vastly outnumber writes — that matters a lot.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix, part 2: bounded retry
&lt;/h2&gt;

&lt;p&gt;Detection alone isn't enough. With &lt;code&gt;@Version&lt;/code&gt; in place but no retry, the stress test stopped corrupting data but a big chunk of transfers now &lt;em&gt;failed&lt;/em&gt; with a conflict — correct, but a lousy experience. So the loser needs to retry.&lt;/p&gt;

&lt;p&gt;The retry helper sits &lt;strong&gt;outside&lt;/strong&gt; &lt;code&gt;@Transactional&lt;/code&gt;, and that placement is the whole point:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;T&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Supplier&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;           &lt;span class="c1"&gt;// a FRESH transaction each attempt&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OptimisticLockingFailureException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;maxAttempts&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ConcurrencyConflictException&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
            &lt;span class="n"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;backoffWithFullJitter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;   &lt;span class="c1"&gt;// 25–200 ms, capped&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each attempt is a brand-new transaction that &lt;strong&gt;reloads the row at its current version&lt;/strong&gt; — retrying inside the failed transaction would just re-fail against the stale version. Defaults: 5 attempts, exponential backoff with &lt;strong&gt;full jitter&lt;/strong&gt; (so a thundering herd doesn't resynchronize into another collision), and on exhaustion a clean &lt;code&gt;409 Conflict&lt;/code&gt; — never a &lt;code&gt;500&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;There's a small piece of reasoning that makes this provably terminating under moderate load: &lt;strong&gt;the k-th committer can only lose to a distinct &lt;em&gt;earlier&lt;/em&gt; committer&lt;/strong&gt;, so it needs at most &lt;code&gt;k&lt;/code&gt; attempts. With 4 concurrent writers and a 5-attempt budget, all 4 succeed deterministically — no flaky test. A genuine hot account (more concurrent writers than the attempt budget) surfaces as &lt;code&gt;409&lt;/code&gt;, which is honest backpressure rather than a hidden corruption.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimistic vs pessimistic: the measured choice
&lt;/h2&gt;

&lt;p&gt;The obvious alternative is pessimistic locking — &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt; to lock the row before touching it, so writer #2 simply waits. No retries, easy to reason about. So why optimistic?&lt;/p&gt;

&lt;p&gt;I didn't want to argue this from vibes, so I wrote a benchmark (&lt;code&gt;TransferConcurrencyBenchmark&lt;/code&gt;) that runs the &lt;strong&gt;identical transfer logic&lt;/strong&gt; under both strategies, 50 concurrent writers, against one real PostgreSQL:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Optimistic + retry&lt;/th&gt;
&lt;th&gt;Pessimistic &lt;code&gt;FOR UPDATE&lt;/code&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Low-contention&lt;/strong&gt; (50 disjoint account pairs)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;34 ms&lt;/strong&gt; · 50/50 ok · &lt;strong&gt;0 retry waste&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;31 ms · 50/50 ok&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;High-contention&lt;/strong&gt; (50 transfers → &lt;strong&gt;1 hot row&lt;/strong&gt;)&lt;/td&gt;
&lt;td&gt;731 ms · 50/50 ok · &lt;strong&gt;185 retry waste&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;358 ms&lt;/strong&gt; · 50/50 ok&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Reading the numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Low contention is the common case, and it's a tie&lt;/strong&gt; (34 vs 31 ms) — but optimistic wastes &lt;em&gt;zero&lt;/em&gt; retries and, crucially, &lt;strong&gt;never blocks reads&lt;/strong&gt;. That's the deciding factor for a read-heavy ledger.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On a single hot row, pessimistic is ~2× faster&lt;/strong&gt; (358 vs 731 ms) and wastes nothing, while optimistic burns 185 extra attempts (≈4.7× the work) on collisions and backoff. But pessimistic "wins" here precisely by &lt;em&gt;serializing and blocking reads&lt;/em&gt; — the thing I'm trying to avoid — and it doesn't actually &lt;em&gt;solve&lt;/em&gt; a hot account, it just queues it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the verdict is optimistic + retry, and the value of the benchmark isn't "optimistic is faster" (it isn't, under contention) — it's that those &lt;strong&gt;185 wasted retries quantify the threshold&lt;/strong&gt; at which a truly hot account (think: every top-up debiting a shared &lt;code&gt;SYSTEM_FUNDING&lt;/code&gt; row) needs a real escalation: async queueing or sub-account sharding, not flipping the whole system to pessimistic locks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Composing with idempotency
&lt;/h2&gt;

&lt;p&gt;There's one more way to double-spend that retry actually &lt;em&gt;makes worse&lt;/em&gt; if you're not careful. A client whose connection drops after the server committed will retry the whole HTTP request — and now you risk posting the transfer twice. Retry-on-conflict and retry-on-network-blip are different problems, and the fix for one must not break the other.&lt;/p&gt;

&lt;p&gt;So both money endpoints require an &lt;code&gt;Idempotency-Key&lt;/code&gt; header (ADR-0012, the Stripe pattern). The mechanism that makes it concurrency-safe is &lt;strong&gt;claim-first&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;idempotency_keys&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'PENDING'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;CONFLICT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;NOTHING&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;-- committed immediately, before business logic&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That atomic insert is the serialization point. Whoever wins the claim runs the operation; a concurrent request with the same key sees the committed &lt;code&gt;PENDING&lt;/code&gt; row and gets &lt;code&gt;409&lt;/code&gt; (in-flight) instead of running a second time. A completed key replays the stored response; a key reused with a &lt;em&gt;different&lt;/em&gt; body gets &lt;code&gt;422&lt;/code&gt; (a client contract violation, deliberately distinct from the &lt;code&gt;409&lt;/code&gt; conflict code).&lt;/p&gt;

&lt;p&gt;The reason this composes cleanly with the retry from earlier: the optimistic-lock retry sits &lt;strong&gt;after&lt;/strong&gt; the key is claimed. All those internal attempts happen under one already-claimed idempotency key, so they're completely invisible to the client and can never produce a second posting. Conflict-retry and request-idempotency stack instead of fighting.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd reach for next
&lt;/h2&gt;

&lt;p&gt;The cache is fast but it &lt;em&gt;can&lt;/em&gt; drift (a bug, a partial failure). So a scheduled &lt;strong&gt;reconciliation&lt;/strong&gt; job re-derives every balance from the immutable entries and alerts on any mismatch — it never auto-corrects; an operator posts a correcting entry. The append-only ledger means the truth is always recoverable.&lt;/p&gt;

&lt;p&gt;And the hot-account ceiling those 185 retries exposed is the next real scaling problem: when one row is genuinely contended, the answer is async posting or sharding that account, with the retry rate as the signal that tells you when you've crossed the line.&lt;/p&gt;




&lt;p&gt;The throughline: in a money system, &lt;em&gt;the cache disagreeing with the ledger&lt;/em&gt; is the failure that matters, and a default-isolation database won't stop you from creating it. A &lt;code&gt;@Version&lt;/code&gt; compare-and-set makes the lost update impossible, bounded retry with jitter makes it invisible under normal load, a benchmark tells you the price you're paying and where the ceiling is, and idempotency makes sure the retries — at every layer — never turn into double-spends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code:&lt;/strong&gt; &lt;a href="https://github.com/xidoke/ledger-service" rel="noopener noreferrer"&gt;github.com/xidoke/ledger-service&lt;/a&gt; — the &lt;a href="https://github.com/xidoke/ledger-service/blob/main/docs/architecture/concurrency-model.md" rel="noopener noreferrer"&gt;concurrency model&lt;/a&gt; doc and ADRs &lt;a href="https://github.com/xidoke/ledger-service/blob/main/docs/adr/0005-ledger-model.md" rel="noopener noreferrer"&gt;0005&lt;/a&gt;, &lt;a href="https://github.com/xidoke/ledger-service/blob/main/docs/adr/0006-balance-representation.md" rel="noopener noreferrer"&gt;0006&lt;/a&gt;, &lt;a href="https://github.com/xidoke/ledger-service/blob/main/docs/adr/0011-concurrency-strategy.md" rel="noopener noreferrer"&gt;0011&lt;/a&gt;, &lt;a href="https://github.com/xidoke/ledger-service/blob/main/docs/adr/0012-idempotency.md" rel="noopener noreferrer"&gt;0012&lt;/a&gt; go deeper. &lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://ledger-service-bjzr.onrender.com" rel="noopener noreferrer"&gt;ledger-service-bjzr.onrender.com&lt;/a&gt; (free instance — first request cold-starts ~50 s).&lt;/p&gt;

</description>
      <category>java</category>
      <category>springboot</category>
      <category>postgres</category>
      <category>concurrency</category>
    </item>
  </channel>
</rss>
